<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Informatique
et les Techniques Avancées, Paris, France
£ luigi.bambaci@ephe.psl.eu (L. Bambaci); daniel.stoekl@ephe.psl.eu (D. Stökl Ben Ezra)
ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing HTR of Historical Texts through Scholarly Editions: A Case Study from an Ancient Collation of the Hebrew Bible</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luigi Bambaci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Stökl Ben Ezra</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Archéologie &amp; Philologie d'Orient et d'Occident UMR 8546, École Pratique des Hautes Études, Université Paris Sciences &amp; Lettres (EPHE, PSL)</institution>
          ,
          <addr-line>Les Patios Saint-Jacques, 4-14 Rue Ferrus, 75014 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>Printed critical editions of literary texts are a largely neglected source of knowledge in computational humanities. However, under certain conditions, they hold signi昀椀cant potential for multifaceted exploration: First, through Optical Character Recognition (OCR) of the text and its apparatus, coupled with intelligent parsing of the variant readings, it becomes possible to reconstruct comprehensive manuscript collations, which can prove invaluable for a variety of investigations, including phylogenetic analyses, redaction history studies, linguistic inquiries, and more. Second, by aligning the printed edition with manuscript images, a substantial amount of Handwritten Text Recognition (HTR) ground truth can be generated. This serves as valuable material for paleography, layout analysis, as well as for assessing the quality of the collation criteria adopted by the editor. The present paper focuses on the challenges mastered in the processes of the OCR, the apparatus parsing, the text reconstruction, and the alignment with the manuscript images, taking as a case study the edition of the Hebrew Bible published by Kennicott in the late eighteenth century.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;layout analysis</kwd>
        <kwd>automatic transcription</kwd>
        <kwd>text encoding</kwd>
        <kwd>Hebrew Bible manuscripts</kwd>
        <kwd>textual criticism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>For centuries, critical editions have served as the backbone of the humanities far beyond
philology, o昀ering important insights into the textual evolution of numerous historical works and
providing scholars with reliable texts for their academic inquiries. The advent of Optical
Character Recognition (OCR) and Handwritten Text Recognition (HTR) technologies as well as
Natural Language Processing (NLP) has opened a new era both in the preservation and in the
exploration of these indispensable works.</p>
      <p>
        Numerous OCR and HTR so昀琀ware solutions are available today, and multiple studies and
projects have contributed to the advancement of digitizing and analyzing the cultural book
heritage. Among the most well-known so昀琀ware, we can mention Transkribus [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ], Monk [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ],
Aletheia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and Tesseract 4.0.1 Notable research e昀orts relevant to ours include the work
by Toselli et al. [26], who exploited huge datasets of existing OCRed printed books for
selfsupervised layout analysis, as well as projects like HORAE [5], which examined large amounts
of Biblical texts or quotations in Latin to create HTR ground truth and conduct manuscript
analysis.
      </p>
      <p>The focus of these advancements has predominantly revolved around traditions within
classical languages or modern languages in the Latin alphabet. However, there has been a recent
shi昀琀 towards including Hebrew texts in such endeavors: one eminent example is the BiblIA
project [25], which examined a substantial corpus of Medieval manuscripts in Hebrew script,
providing the 昀椀rst public dataset of transcriptions as well as e昀케cient models for automatic
segmentation and text recognition.</p>
      <p>In this paper, we aim to contribute to the ongoing progress in digital Hebrew research,
focusing in particular on the corpus of scholarly editions and biblical manuscripts. We will delve
into the challenges faced in digitizing and encoding an ancient edition of the Hebrew Bible,
namely the eighteenth-century collation by Benjamin Kennicott, and we will elucidate how
we extracted from it a large amount of complete manuscript texts that we will be able to align
with their manuscript images.</p>
      <p>The signi昀椀cance of Kennicott’s collation for biblical studies remains unparalleled. The wealth
of data it o昀ers is exceptional and its potential applications are manifold, as we will elaborate
shortly (§ 2). Yet, the sheer volume and complexity of the data constitute a signi昀椀cant obstacle
to analysis, compelling scholars to work on limited samples and to perform laborious manual
processing.</p>
      <p>Through the digitization of this edition, our aim is to provide the scholarly community with
a digital resource for swi昀琀, e昀케cient, and large-scale examinations of Hebrew Bible manuscripts.
Additionally, we will leverage Kennicott’s collation for an unprecedented purpose: enhancing
the performance of HTR systems using automatically reconstructed texts derived from the
critical apparatus data. The automatic generation of these texts will a昀ord us a massive amount
(approximately 75,000 pages) of ground truth for HTR of Hebrew manuscripts, while the
alignment with the images will enable us not only to measure the degree of discrepancy between the
original collation and the actual manuscripts, but also to correct errors, 昀椀ll gaps, and produce
more faithful and updated collation data.</p>
      <p>In the next sections, we will provide a detailed account of our work. We will elaborate on
how we conducted layout analysis for the purpose of segmentation and transcription (§3§.2,
3.3), and how we automatically encoded the data present in the critical apparatus using a
rulebased parser (§ 3.4). Lastly, we will demonstrate how we successfully generated complete texts
of fully collated witnesses and how we are going to use them as training data to improve and
speed up the automatic transcription of a number of manuscripts of the Hebrew Bible (§4).</p>
      <p>The method we are about to present here is part of an ongoing project entitled Reverse
Engineering Kennicott (REK), funded by Biblissima+2 and directed by the École Pratique des
Hautes Études, Paris Sciences et Lettres University. The project is carried out in close synergy
1https://github.com/tesseract-ocr/tesseract.
2https://biblissima.fr/.
with Ktiv,3 the most important online catalog of Hebrew manuscripts, and with the National
Library of Israel,4 and is centered on the web application eScriptorium5.</p>
      <p>At the time of writing this article, the 昀椀rst of the two volumes of Kennicott’s work has
been encoded, and the text of the witnesses of the book of Genesis have been automatically
generated.</p>
      <p>Before illustrating the pipeline of our project, let us brie昀氀y outline Kennicott’s work, in order
to explain why it is so important for biblical research and how it can be used to fully recover
the text of medieval manuscripts.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Kennicott’s collation of the Hebrew Bible</title>
      <p>The Hebrew Bible is a compilation of texts from the 昀椀rst millennium before the common era,
written primarily in Hebrew with some sections in Aramaic, and totaling around 470,000 tokens.
Sacred text in Judaism and later also Christianity, with numerous translations into many
ancient languages such as Greek, Latin, Aramaic, Armenian, Coptic, Georgian, Arabic and, since
the reformation era, into virtually all contemporary languages, it is one of the most important
texts extant worldwide.</p>
      <p>Kennicott was the 昀椀rst scholar to systematically gather and collate the Hebrew textual
witnesses of the Bible.</p>
      <p>
        His two-volumes collation, published at Oxford between 1776 and 1780 and titledVetus
Testa3https://www.nli.org.il/en/discover/manuscripts/hebrew-manuscripts.
4https://www.nli.org.il/en.
5https://msia.escriptorium.fr/.
mentum Hebraicum cum variis lectionibus [
        <xref ref-type="bibr" rid="ref15 ref16">14, 15</xref>
        ], remains the largest of its kind to this day: its
extensive critical apparatus, built upon the examination of no fewer than 600 manuscripts and
70 printed editions (Fig.1), is estimated to contain something like 1,500,000 pieces of textual
information.6
      </p>
      <p>
        Kennicott’s work has never been replaced: De Rossi’s collations 1[
        <xref ref-type="bibr" rid="ref10">0, 9</xref>
        ], which were
published shortly a昀琀erwards, are highly eclectic and present only a restricted selection of variants,
while later editions either depend on these classical collation7s,or drastically reduce the
number of collated manuscripts,8 or even dispense with the testimony of medieval manuscripts
altogether.9
      </p>
      <p>
        The use of Kennicott’s data is not con昀椀ned solely to consultation or the compilation of critical
editions. Scholars have repeatedly demonstrated how it is possible to extract relevant research
information out of Kennicott’s apparatus: from textual history, enabling the reconstruction
of the transmission process of the Hebrew Bible in the Middle Ages through stemmatological
methods, such as clustering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and phylogenetics [
        <xref ref-type="bibr" rid="ref2 ref3 ref9">3, 2</xref>
        ]; to philology, for the study of common
copying errors and scribal habits; from codicology and paleography, aiding in dating and
localizing new manuscripts [
        <xref ref-type="bibr" rid="ref13 ref20">19, 12</xref>
        ]; to linguistics, allowing the analysis of variant spelling and
orthography [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Kennicott’s collation is a valuable resource for research across all these domains. Indeed, it
stands as the inaugural and, up to present, solitary endeavor to provide a scholarly edition of
the medieval Hebrew Bible text. Let us take a closer look at some of its key features.</p>
      <p>The work is organized into sections, each dedicated to a biblical book (e.g. Genesis, Exodus
etc.) or to a collection of biblical books (e.g. the Five Megilloth: Song of Songs, Ruth etc.).
Each section comprises two main parts: a reference text10 printed at the top page and a critical
apparatus of variants printed at the bottom (Fig.2).</p>
      <p>In the apparatus, the witnesses are cited using unique alphanumericsigla. Keys to these
sigla are provided in the catalog reproduced in the introduction to the 昀椀rst volume, containing
the most relevant bibliographical information, such as date and provenance11.</p>
      <p>
        In addition to this catalog, Kennicott provides recapitulative lists of witnesses at the end
of each book or collection of books. The purpose of these lists is to categorize the witnesses
into manuscripts and printed editions, as well as to signal which of them have been collated
in full (per totum collati) and which only partially (in loci selectis collati). This distinction by
degree of collation, which is absent, for example, in De Rossi, is of fundamental importance
and directly impacts our work: since only fully collated witnesses can provide the basis for
a systematic gathering of variants, it permits us to identify which are the witnesses we can
reasonably expect to obtain complete and reliable automatic transcriptions for.
6[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] 28昀.
7So for example the Biblia Hebraica Stuttgartensia [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ].
8Like the Hebrew University Bible [
        <xref ref-type="bibr" rid="ref24">22</xref>
        ].
9As the Biblia Hebraica Quinta [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ].
10Taken from the most widely used edition at the time, that of E. van der Hooght (Amsterdam 1705), which Kennicott
adopted as basis for his collation.
11Most of Kennicott’s manuscripts have been identi昀椀ed: in 2020, Idan Dershowitz published a comprehensive list of
these manuscripts containing URLs to Ktiv, where updated bibliographic information and, when available, images
can be found. This list is accessible on the author’s academia.edu page:https://www.academia.edu/37862623.
      </p>
      <p>Such systematic approach towards collation is the hallmark of Kennicott’s method. In
contrast to what De Rossi would later do, Kennicott goes beyond the most conspicuous phenomena
of variation, encompassing all potential discrepancies between the reference text and each
individual witness, such as spelling, the layout of paratextual elements, and various details of the
mise en page. This choice, however philologically questionable, actually bene昀椀ts us by assuring,
at least theoretically and net of inconsistencies, errors, and omissions, that we have complete
lists of variants at our disposal.</p>
      <p>Finally, but most importantly, Kennicott organizes the variants in the apparatus in an
extremely precise manner, minimizing the use of natural language and adopting a formalism
that anticipates that of most recent editions. On this aspect, which is crucial for automatically
extracting information from the critical apparatus, we will dwell at length later on (§3.4.1).</p>
      <p>The features we have just listed e昀ectively make Kennicott’s work not only a rich source of
data on the textual tradition of the Hebrew Bible, but also an ideal candidate for our
computational treatment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Pipeline</title>
      <p>REK’s main objectives are threefold:
1. to obtain a TEI-compliant encoding of both the reference text and the critical apparatus
of Kennicott;
2. to reconstruct the text of 244 manuscripts fully automatically by way of encoding, for a
total of approx. 75,000 pages;
3. to provide an accurate and complete transcription of the text of 10 Kennicott’s
manuscripts (approx. 7,500 pages) through alignment with these automatically reconstructed
texts
To achieve these objectives, we devised the following 4-step pipeline:
1. acquisition of images of Kennicott’sVetus Testamentum and of the 10 chosen
manuscripts;
2. automatic segmentation and transcription;
3. parsing and encoding of Kennicott’s apparatus;
4. reconstruction of the witness texts</p>
      <p>We will now discuss each of these points in detail, presenting the work done as well as
outlining what is yet to be accomplished. Let us begin with the 昀椀rst step, image acquisition.</p>
      <sec id="sec-3-1">
        <title>3.1. Image acquisition</title>
        <p>Digital copies of Kennicott’sVetus Testamentum are freely available on the web on platforms
such as Archive.org and Google Books, both in .pdf format and in various image formats. We
chose the .jp2 images from Archive,12 which are in an acceptable resolution, and converted
12First volume: https://archive.org/details/vetustestamentum01kenn; second volume: https://archive.org/details/
vetustestamentum02kenn.
them to .jpeg, which is most widely supported and produces smaller 昀椀le sizes which still su昀케ce
for OCR.</p>
        <p>Among the manuscripts collated by Kennicott, we have identi昀椀ed about 20 that are important
for their variants. Among these, we have selected 10, based on criteria of convenience such
as simple layout, the absence of inline translations into targumic Aramaic, and, of course, the
availability of the images (Tab.1).</p>
        <p>
          Among the di昀erent so昀琀ware mentioned in the Introduction, we have chosen to work with
eScriptorium [
          <xref ref-type="bibr" rid="ref18 ref25">24, 17, 23</xref>
          ] and its OCR/HTR engine, Kraken [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ],13 which is optimized for
historical and non-Latin script material.
        </p>
        <p>To upload the images of these manuscripts into eScriptorium, we made use of the IIIF
standard: for each chosen manuscript, we retrieved the IIIF manifest and then we used Python
scripts to download the images and populate our database.</p>
        <p>In the next section, we will discuss the segmentation (§3.2) and transcription process (§
3.3). For the sake of clarity, we will devote separate subsections to segmentation and
transcription of the Vetus Testamentum (§§ 3.2.1, 3.3.1) and of Kennicott’s manuscripts (§§3.2.2, 3.3.2),
respectively.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Segmentation</title>
        <p>Once we uploaded the images of theVetus Testamentum and the manuscripts onto eScriptorium,
we proceeded with segmentation, which is indispensable for identifying those regions on the
page where the text to be transcribed is located.</p>
        <p>For both segmentation and transcription, we used models in .mlmodel model format trained
with Kraken so昀琀ware. These models can be trained with Kraken and then imported into
eScriptorium. Alternatively, as in our case, they can be trained directly within the eScriptorium
application.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Vetus Testamentum</title>
          <p>The layout of the Vetus Testamentum is complex, but the segmentation was relatively
straightforward. We started by de昀椀ning a segmentation ontology, distinguishing running headers,
titles, le昀琀 and right main columns, and le昀琀 and right apparatus for both region types and line
types. Following this, we manually segmented approx. 30 pages and trained a model on this
sample. With this model, we were able to automatically segment the entire 昀椀rst volume,
keeping manual corrections to a bare minimum. Fig.3 shows an example of segmentation of regions
(3a) and lines (3b) taken from the 昀椀rst volume.</p>
          <p>As can be seen, eScriptorium provides an intuitive graphical interface that allows the
creation of an ontology to distinguish between di昀erent types of regions and lines, which are
represented by di昀erent colors. This feature is extremely useful, as it enabled us to mark only
the portions of text for which we wanted to obtain a transcription, namely the reference text
(the two regions at the top) and the critical apparatus (the two regions at the bottom), while
excluding titles, headers, page numbers, and catchwords.</p>
          <p>Similarly, by marking the types of lines, we can express the order of columns and the textual
氀昀ow. This permitted us to di昀erentiate between lines we need to transcribe and those we do
not (e.g., the Samaritan text with its variants, see Fig.2).</p>
          <p>In addition to its user-friendly graphical interface, eScriptorium o昀ers a rich API, which
makes it possible to automate numerous segmentation- and transcription-related operations.
Using the API functions, we opted to replace the polygonic lineboundaries with
parallelogrammatic ones, as they were found to enhance transcription accuracy (Fig.4).
3.2.2. Kennicott’s manuscripts
We have applied the same segmentation procedures to the medieval manuscripts. Unlike the
Vetus Testamentum, which required us to create our own models from scratch, there already
exist excellent segmenters as well as recognizers for Hebrew manuscripts, and ongoing research
in this area continually improves their accuracy1.4 Only occasionally, for manuscripts with
less regular layout, we had to train new models on top of these standard models.</p>
          <p>An instance of automatic segmentation for one of the 10 manuscripts in our possession can
be seen in Fig. 5.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Transcription</title>
        <p>We proceeded with transcription next. Currently, we have completed the transcription of both
the reference text and the critical apparatus of Kennicott’s 昀椀rst volume, and we are now in the
process of transcribing the manuscript texts.
14The segmentation models we used are accessible here:https://github.com/dstoekl/sofer_mahirand the
recognition models here: https://zenodo.org/record/5167263#.YhzNEtIo-po.
(a) Region segmentation
(b) Line segmentation
Figure 3: Layout analysis from the collation of the book of Genesis.</p>
        <p>562
(a) Before repolygonization
(b) A昀琀er repolygonization</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Vetus Testamentum</title>
          <p>Transcribing the text of the Vetus Testamentum posed numerous challenges. The reference
text and the critical apparatus follow two distinct textual 昀氀ows, each with its own peculiarities
and complexities, and require di昀erent treatments. We opted, therefore, to transcribe them
separately.</p>
          <p>The main complexity of the critical apparatus lies in the presence of two di昀erent
alphabets (Hebrew and Latin) with distinct directionality (right-to-le昀琀 and le昀琀-to-right), as well as
of punctuation, numbers, and special symbols that require exact reproduction for proper
parsing (§ 3.4.1). Dealing with directionality proved particularly demanding, since RTL and LTR
markers are invisible and therefore di昀케cult to manage during correction. We successfully
overcame this obstacle by employing a visible LTR marker to establish proper word order. A昀琀er
transcribing manually a sample of about 30 pages and training a recognition model on these
sample pages, we 昀椀nally managed to achieve a satisfactory accuracy of approx. 98%. 15 Thanks
to the introduction of the LTR marker, the resulting transcriptions became much more
manageable to correct.</p>
          <p>Transcribing the reference text, on the other hand, proved notably smoother, since it is in a
single alphabet, Hebrew,16 and since it reproduces a standard text, that of the Hebrew Bible, for
which excellent transcription models, as we mentioned (§3.2.2), already exist. The combination
of these features resulted in an accuracy of 98%.
15Starting hereon, the accuracy percentages we provide for transcription are based on Character Error Rate (CER)
metric, which is the one used by Kraken.
16Excluding verse and chapter numbers, which were added in post-processing, see §3.4.</p>
          <p>As for the correction of the reference text, we took advantage of the recent integration of
passim’s17 text-to-text alignment into eScriptorium, which allows loading an external version
of the same text and aligning it with the output of the automatic transcription. This alignment
signi昀椀cantly expedited the correction process: As depicted in Fig. 6, di昀erences between the
aligned versions are highlighted (deletions in red and additions in green), enabling easy
identi椀昀cation of errors as well as variants. The exceptional bene昀椀ts of this tool are evident, and we
are con昀椀dent that it will prove immensely helpful also for the correction of the reconstructed
textual witnesses (§ 4).</p>
          <p>Before going on to describe the treatment of medieval manuscripts, it is only right to spend
a few words about the manual correction process, which is by far the most time-consuming for
the human user.</p>
          <p>The graphical interface of eScriptorium is designed to make the manual correction process
easier: As shown in Fig. 7, eScriptorium enables the user to scroll through the text line by
line, with the original image alongside the result of the automatic transcription. Additionally,
eScriptorium allows for the creation of customizable keyboards, which can be used to insert
characters that are not easily reproducible otherwise. This utility proved exceptionally
convenient for correcting the critical apparatus, which, as mentioned, contains many of these special
characters.</p>
          <p>Once we obtained correct transcriptions for both the reference text and the critical apparatus,
we exported them using eScriptorium’s API, so as to have pairs of .txt 昀椀les (text + apparatus)
for each treated biblical book (Figs.8a and 8b).</p>
          <p>Finally, we post-processed these 昀椀les (removing hyphenations, regularizing newlines etc.) to
obtain copies suitable for automatic encoding (§3.4). Examples of these post-processed texts
are visible in Figs. 9a and 9b.
3.3.2. Kennicott’s manuscripts
We are presently working in transcribing the 10 Kennicott’s manuscripts (Fig.10), using the
models mentioned in Section3.2.2. When we have their text, we plan to utilize the same
alignment feature discussed in Section3.3.1, which was used for transcribing Kennicott’s reference
text. This will help us locate transcription errors more e昀ectively and speed up correction.</p>
          <p>Upon completing the transcription process, we intend to align these texts with the texts
automatically reconstructed from Kennicott’s data. Subsequent sections will explain the details
of this reconstruction.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. XML encoding</title>
        <p>In order to process the data from a scholarly edition, mere machine-readability in the provided
.txt 昀椀les is insu昀케cient. It is crucial for the data to be machine-actionable to enable automated
processing and analysis. To achieve this essential feature, we opted for XML encoding, the
most widely adopted practice in Digital Scholarly Editing.</p>
        <p>As for the encoding of Kennicott’s reference text, we used simple Python scripts to 昀椀rst
divide the text of each biblical book into its hierarchical units, i.e. chapters, verses, and words.
Then, we compared these segmented texts with a standard digital version of the Bible in order
to determine the exact number of chapters and verses. An extract of encoded reference text is
shown in Fig. 11.</p>
        <p>
          The encoding of the critical apparatus was much more complex. We decided to follow and
extend the methodology outlined in Bambaci2[
          <xref ref-type="bibr" rid="ref1">, 1</xref>
          ], which involves the development of a
rulebased parser for automatic encoding. A detailed account of this methodology can be found
there. In the following subsection, we will highlight the key points necessary for readers to
understand how we manage to obtain XML 昀椀les out of Kennicott’s critical apparatus.
3.4.1. Parsing the critical apparatus
As anticipated in Section2, Kennicott’s critical apparatus proves to be highly suitable for
automatic parsing due to its rigorous language and structured presentation of variants. Instead
(b) Critical apparatus
of using Latin commentary-like notes like De Rossi, Kennicott employs a highly formalized
language, in which each element performs a precise function according to the position it
occupies in the overall structure and according to the class of strings (letters, numbers, symbols) to
which it belongs. Both the position and the class of strings can be “captured” by the rules of
a Context-Free Grammar (CFG), and these rules “fed” to the parser in order to recognize the
function of the individual apparatus components.
        </p>
        <p>Let us consider a fragment of the apparatus as shown in Fig.12.</p>
        <p>For simplicity, let us focus on the 昀椀rst apparatus entry only:</p>
        <p>5. !Mיהלא – !יהלא 109.</p>
        <p>which informs us of the substitution of ! M‘יהלא’ with ‘!יהלא’ in manuscript no. 109. The
philologist will immediately recognise the following elements: the place of variation, expressed
by the verse number (‘5’), separated by a dot (‘.’); the lemma of the reference text, expressed
in Hebrew letters (‘!Mיהלא’) and separated by a horizontal line (‘–’); the variant (!י‘הלא’); the
numerical siglum of the manuscript (‘109’); and 昀椀nally a dot followed by a long white space
(encoded as a tabulation, see below), which closes the apparatus entry.</p>
        <p>
          1 grammar kennicottCFG;
2 all: app;
3 app: loc lem var appSep;
4 loc: verse locSep;
5 lem: w lemSep;
6 var: w wit;
7 verse: NUM;
8 locSep: DOT;
9 w: HEBW;
10 lemSep: DASH;
11 wit: NUM;
12 appSep: DOT TAB;
13 NUM: [
          <xref ref-type="bibr" rid="ref1 ref10 ref2 ref3 ref4 ref6 ref7 ref8 ref9">0−9</xref>
          ]+;
14 HEBW: [\u0590−\u05ff]+;
15 DASH: '—';
16 DOT: '.';
17 TAB: '\t';
18 WHITESPACE : ' ' −&gt; skip;
        </p>
        <p>A CFG as the one shown in Fig.13 can be formulated18 in order to describe this apparatus
entry and instruct the parser on how to recognize its individual elements correctly.</p>
        <p>With the 昀椀rst rule ( all) we describe the structure of the entire document, which in our
example consists of a single apparatus entry, which we call app. This in turn consists of a
variant location (loc), a lemma (lem), a separator for the lemma (lemSep), a variant (var), a
witness number (wit), and 昀椀nally a separator for the apparatus ( appSep). A variant location
consists in turn of a sequence of numbers (NUM); lemma and variant contain Hebrew words
(HEB); separators consist of horizontal bars D(ASH), dots (DOT), and tabulations (TAB).</p>
        <p>With the 昀椀rst sequence of rules we established, the so-called parsing rules, we are able to
de昀椀ne the order of succession of the elements (that is, their syntax), as well as to express
their function using “speaking” names that make their meaning explicit for the philologist.
With the second sequence of rules, called tokenization rules, we instead indicate the class of
strings to which the individual elements belong, such as numerals [(0-9]), alphabetical letters
([\u0590-\u05ff]), punctuation etc.</p>
        <p>
          By employing a CFG akin to the illustrated fragment and using ANTLR4 so昀琀ware [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ],19 we
were able to automatically encode the entire critical apparatus of the 昀椀rst volume into XML,
with minimum cost and very high accuracy (around 98%)2.0
18This CFG is designed just for explanation purposes. The CFG we used to parse Kennicott’s apparatus is much
more complex and will be published, along with all the relevant material, upon completion of the project (4§).
19https://www.antlr.org/.
20This percentage is indicative and is calculated for the book of Genesis by simply dividing the total number of
XML elements correctly assigned by the parser by the total number of XML elements found in this book. To
identify errors and derive the correct elements through subtraction, we use the element&lt;lem&gt; (lemma) as the
unit of measurement. Here is the calculation: In Genesis, the total number of lemmata amounts to 6,866; out
of these, 146 are cases of lemmata erroneously interpreted as readings &lt;(rdg&gt;) due to syntactic ambiguity; the
parser correctly identi昀椀ed 6,720 lemmata; the accuracy is therefore equal to 6866−146 × 100 = 97.87%. This value
6866
An extract of XML code of the apparatus is shown in Fig.14.
        </p>
        <p>1</p>
        <p>Having been encoded, the reference text and the apparatus are now ready for the
reconstruction of the witness text, which is our ultimate goal.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Text reconstruction</title>
      <p>To reconstruct the witnesses, all variants in the apparatus must 昀椀rst be mapped onto the
reference text using the lemmata, as it were, as foreign keys. Once the mapping has been performed,
our textual reconstruction proceeds simply by replacing, for each manuscript, the lemma in the
reference text with the variant in the apparatus.
Reconstructed text of ms. no. 109:</p>
      <p>...!הליל ארק Kשחלו Mוי רואל !Mיהלא !ארקיו 1:5
5. !Mיהלא – !יהלא 109. !Kשוחלו 152, 206. !רקוב 9.</p>
      <p>!הליל ארק Kשחלו Mוי רואל !יהלא !ארקיו 1:5
An example of such a procedure is shown in Tab.2 (see also Fig. 12), where the lemma
may naturally vary from book to book, but there is no signi昀椀cant reason to expect a substantial change. For
instance, the same calculation performed on the book of Exodus yield9s9.32%, and on Leviticus it is98.38%. A
comprehensive estimate of the parser’s accuracy will only be feasible upon the project’s completion.
‘!Mיהלא’ corresponds to the variant !י‘הלא’ in manuscript no. 109. Textual reconstruction is
straightforward here: using Python, we simply replace the lemma with the variant, as shown.
Cases like this, where each apparatus entry corresponds to one and only one lemma, are the
easiest to deal with and constitute the majority in Kennicott, accounting for about 70% of the
entries.</p>
      <p>In the remaining 30% of cases, on the other hand, we do not have any lemma provided,
and we need to deduce it from the reference text before we can map the variants. Automatic
deduction was possible in all but 3% of the cases.</p>
      <p>Let us give one example of the most common case (Tab.3).
Reconstructed text of mss. nos. 152, 206:</p>
      <p>...!הליל ארק !Kשחלו !Mוי רואל Mיהלא ארקיו 1:5</p>
      <p>In the apparatus, as shown, the variant ‘!Kשוחלו’ is cited for manuscripts no. 152 and 206,
but the lemma of the reference text ‘!Kשחלו’ is missing. Due to the very close proximity of the
two words (only one character di昀erence), the reference is immediate for the human reader,
and for this reason, it is omitted. To make this information available to the machine, we use
Levenshtein distance (or edit distance, ), which returns the correct solution in our example
(‘!Kשוחלו’, with  = 1 ). Such an approach proved to be quite e昀ective for our case study: 60%
of all the variants in Kennicott are in fact graphical variants that involve only a few letters.</p>
      <p>There are cases, however, where this approach returns multiple outputs with equal distance
value, as well as more complex cases where the lemma spans across two or more verses, or
where the lemma is not given explicitly in Hebrew, but is rather described by Latin phrases. At
the time of writing this article, such residual cases account for about 3% of the total, but we are
further improving their automatic treatment.</p>
      <p>Using the procedures just described, we have been able to reconstruct the full text of 114
witnesses of the book of Genesis, including 97 manuscripts and 17 printed editions. We are
now working on generating transcriptions for the entire Enneateuch (from Genesis to Kings,
corresponding to Kennicott’s 昀椀rst volume), which means an average of 100 witnesses per
biblical book and approx. 35,000 manuscript pages obtained in a fully automatic manner (Fig1.5a
and Tab. 4).</p>
      <p>Next, we will align these automatically generated text with the automatic transcriptions of
Kennicott’s manuscripts, and then we will correct them using eScriptorium alignment feature
discussed in Section3.3.1. A昀琀er correction, we will have approx. 7,500 pages of text, which
(a) First volume
(b) Second volume
will allow us to train new and accurate models for automatic text recognition of Hebrew
manuscripts.</p>
      <p>Finally, once we have the XML 昀椀les of all relevant texts (Kennicott’s reference text and
apparatus, reconstructed witness texts etc.), we will take care to convert our custom XML language
to TEI standards using XSLT, in order to ensure data interchangeability and reusability.</p>
      <p>We plan to make all the data generated throughout the project, from the HTR models to
XML and text 昀椀les, publicly available. We envisage publishing all pertinent segmentation and
recognition models on Kraken’s Zenodo repositor2y1. For the HTR and OCR results, we could
either publish their di昀erent milestone stages in a separate repository on Zenodo with pointers
from Biblissima+ (and e.g. HTRunited) or directly on Biblissima+. Moreover, we will post all
the relevant material for the project at our GitHub address2.2</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We discussed how traditional scholarly editions could o昀er a viable pathway to improve the
performance of current HTR models. Taking the concrete example of the REK project, we
illustrated how, through encoding the critical apparatus, we were able to generate complete
automatic transcriptions of witness texts, and how we plan to obtain a large amount of training
data useful for HTR of biblical Hebrew manuscripts out of these transcriptions.</p>
      <p>The accuracy values achieved so far are highly encouraging. All the Kraken models for
segmentation and transcription that we used have proven to be exceptionally performant, even
in handling highly complex texts such as the critical apparatus: their overall accuracy is never
lower than 97%.</p>
      <p>The decision to implement a rule-based parser for mining the apparatus has also been fruitful:
thanks to it, we have been able to automatically encode a huge amount of data (more than
65,000 apparatus entries) that would have been unthinkable to encode manually, and this with
an accuracy of 98%.</p>
      <p>The automatic reconstruction of the texts of the witnesses has been equally e昀케cient, and is
fully automatable for 97% of the cases. The remaining portion that necessitates manual
intervention is still substantial, considering the number and complexity of interventions needed for
each biblical book, but we are con昀椀dent that we can increase the automation of variant
mapping (including, for example, the case of lemmata spanning multiple verses), thereby further
reducing the need for manual correction.</p>
      <p>The data and statistics presented here refer to the 昀椀rst volume of the Vetus Testamentum,
the processing of which is nearing completion. Our intention is to extend the methodology
discussed to the second volume (Fig.15b), which will allow us to increase the number of
reconstructible witnesses up to 244, for a total of approx. 75,000 manuscript pages (Tab.4).</p>
      <p>Moreover, we are intending to incorporate the remaining 10 out of the 20 identi昀椀ed
manuscripts mentioned in Section3.1. This will enable us to double the quantity of pages with
highly accurate transcriptions, providing us with an augmented ground truth from which to
develop further enhanced HTR models speci昀椀cally tailored for Hebrew Bible manuscripts.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>We would like to express our sincere thanks to Idan Dershowitz from the University of
Potsdam for his invaluable collaboration, as well as to Uriel Aiskovich for his assistance with the</p>
      <p>Book</p>
      <p>No. mss</p>
      <p>No. pages</p>
      <p>First volume
Genesis
Exodus
Leviticus
Numbers
Deuteronomy
Joshua
Judges
I-II Samuel
I-II Kings
Isaiah
Jeremiah
Ezekiel
Minor Prophets
Psalms
Job
Proverbs
Megilloth
Daniel
Ezra-Nehemiah
Chronicles
|Total|</p>
      <p>Second volume
97
103
101
103
108
65
67
67
65
72
71
69
69
102
87
76
126
68
71
68
244
5,820
5,150
3,535
5,150
4,644
1,950
1,943
4,623
4,745
3,528
4,473
3,795
3,243
6,426
2,262
1,748
4,032
1,224
2,130
5,168
75,589
manuscript segmentation task. We also wish to extend our appreciation for the support kindly
provided by the National Library of Israel’s sta昀.</p>
      <p>Our research received generous funding from the Agence Nationale de Recherche as part of
the Programme d’investissements d’avenir within the France 2030 framework, under the
reference ANR-21-ESRE-0005. Additionally, we bene昀椀ted from funding by the European Union
through the MiDRASH project (ERC, project number 101071829). Please note that the views
and opinions expressed in this paper are those of the author(s) alone and do not
necessarily represent the views of the European Union or the European Research Council Executive
Agency. Neither the European Union nor the granting authority can be held responsible for
the expressed views.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bambaci</surname>
          </string-name>
          . “
          <article-title>Critical Apparatus as Domain Speci昀椀c Languages. A Rule-based Parser for Encoding an Eighteenth-Century Collation of Hebrew Manuscripts”</article-title>
          .
          <source>In:International Journal of Information Science and Technology 5.1</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>22</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bambaci</surname>
          </string-name>
          . “
          <article-title>Digitizing Kennicott's Collation of the Hebrew Bible: Experiences of Encoding and of Computer-Assisted Stemmatic Analysis”</article-title>
          . In:
          <article-title>Jewish Studies in the Digital Age</article-title>
          . Ed. by
          <string-name>
            <given-names>G.</given-names>
            <surname>Zaagsma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stökl Ben</surname>
          </string-name>
          <string-name>
            <surname>Ezra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miriam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Michelle</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Amalia S</surname>
          </string-name>
          .
          <article-title>Studies in Digital History and Hermeneutics 5</article-title>
          . De Gruyter,
          <year>2022</year>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>334</lpage>
          . doi:
          <volume>10</volume>
          .1515/978 3110744828-
          <fpage>014</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bambaci</surname>
          </string-name>
          . “
          <article-title>Is a Stemma Possible for the Hebrew Bible? Towards a Genealogy of Medieval Manuscripts Through Phylogenetic Analysis”</article-title>
          .
          <source>In:Materia Giudaica - Rivista dell'Associazione Italiana per lo Studio del Giudaismo Xxvi</source>
          .
          <volume>2</volume>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Barthélemy</surname>
          </string-name>
          . “
          <article-title>Les manuscrits médiévaux et le texte tibérien classique”</article-title>
          . In: Critique textuelle de l'
          <source>Ancien Testament</source>
          ,
          <volume>3</volume>
          . Ézéchiel,
          <source>Daniel et les 12 Prophètes</source>
          . Vol.
          <volume>3</volume>
          .
          <string-name>
            <given-names>Orbis</given-names>
            <surname>Biblicus</surname>
          </string-name>
          et Orientalis 50. Fribourg/Göttingen: Éditions Universitaires/Vandenhoeck &amp; Ruprecht,
          <year>1992</year>
          , pp.
          <fpage>xix</fpage>
          -xcvi.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Boillet</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-L. Bonhomme</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Stutzmann</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Kermorvant</surname>
          </string-name>
          . “HORAE:
          <article-title>An annotated dataset of books of hours”</article-title>
          .
          <source>In:Proceedings of the 5th International Workshop on Historical Document Imaging and Processing</source>
          .
          <year>2019</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .1145/3352631.3352633.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>P. G. Borbone.</surname>
          </string-name>
          “Appendice - La tradizione medievale”.
          <source>InI:l libro del profeta Osea - Edizione critica del testo ebraico. Torino: Zamorani</source>
          ,
          <year>1990</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Clausner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pletschacher</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonacopoulos</surname>
          </string-name>
          . “
          <article-title>Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments”</article-title>
          .
          <source>InP: roceedings of the 2011 International Conference on Document Analysis and Recognition</source>
          .
          <year>2011</year>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1109/icdar.
          <year>2011</year>
          .
          <volume>19</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cohen</surname>
          </string-name>
          . “
          <article-title>The 'Masoretic Text' and the Extent of Its In昀氀uence on the Transmission of the Biblical Text in the Middle Ages”</article-title>
          . In:Studies in Bible and Exegesis. Ed. by U. Simon.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          Vol.
          <volume>2</volume>
          . Ramat Gan: Bar Ilan University Press,
          <year>1986</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <surname>G. B. De Rossi</surname>
          </string-name>
          .
          <article-title>Scholia critica in V.T. libros, seu supplementa ad varias sacri textus lectiones</article-title>
          .
          <source>Parma: Ex regio typographeo</source>
          ,
          <volume>1798</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>G. B. De Rossi</surname>
          </string-name>
          .
          <article-title>Variae lectiones Veteris Testamenti</article-title>
          .
          <source>Parmae: Ex regio typographeo</source>
          ,
          <fpage>1784</fpage>
          -
          <lpage>1788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Elliger</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Rudorf</surname>
          </string-name>
          .
          <source>Biblia Hebraica Stuttgartensia. 5th. Stuttgart: Deutsche Bibelgesellscha昀琀</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Penkower</surname>
          </string-name>
          .
          <article-title>“A Sheet of Parchment from a 10th or 11th Century Torah Scroll: Determining its Type among Four Traditions (Oriental, Sefardi</article-title>
          , Ashkenazi, Yemenite)”.
          <source>In: Textus 21.1</source>
          (
          <issue>2002</issue>
          ), pp.
          <fpage>235</fpage>
          -
          <lpage>264</lpage>
          . doi:
          <volume>10</volume>
          .1163/2589255x-
          <fpage>02101012</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Colutto</surname>
          </string-name>
          , G. Hackl, and
          <string-name>
            <surname>G. Mühlberger.</surname>
          </string-name>
          “
          <article-title>Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents”</article-title>
          . In1:st International Workshop on Open Services and
          <article-title>Tools for Document Analysis</article-title>
          ,
          <source>14th IAPR International Conference on Document Analysis and Recognition</source>
          ,
          <string-name>
            <surname>OSTICDAR</surname>
          </string-name>
          <year>2017</year>
          , Kyoto, Japan, November 9-
          <issue>15</issue>
          ,
          <year>2017</year>
          . Vol.
          <volume>04</volume>
          .
          <year>2017</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1109/icdar.
          <year>2017</year>
          .
          <volume>307</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kennicott</surname>
          </string-name>
          .
          <article-title>Vetus Testamentum Hebraicum cum variis lectionibus</article-title>
          . Vol.
          <volume>1</volume>
          . Oxford: Clarendon,
          <fpage>1776</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kennicott</surname>
          </string-name>
          .
          <article-title>Vetus Testamentum Hebraicum cum variis lectionibus</article-title>
          . Vol.
          <volume>2</volume>
          . Oxford: Clarendon,
          <fpage>1780</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          . “
          <article-title>Kraken - An Universal Text Recognizer for the Humanities”</article-title>
          . In:
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stokes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Stökl</surname>
          </string-name>
          Ben Ezra.
          <article-title>“eScriptorium: An Open Source Platform for Historical Document Analysis”</article-title>
          .
          <source>In:2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)</source>
          . Vol.
          <volume>2</volume>
          .
          <year>2019</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .110 9/icdarw.
          <year>2019</year>
          .
          <volume>10032</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Parr</surname>
          </string-name>
          .
          <source>The De昀椀nitive ANTLR 4 Reference</source>
          . Dallas/Raleigh: Pragmatic Bookshelf,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Penkower</surname>
          </string-name>
          .
          <article-title>“A Tenth-century Pentateuchal MS from Jerusalem (MS C3), Corrected by Mishael ben Uzziel”</article-title>
          .
          <source>In: Tarbiz 58.1</source>
          (
          <issue>1988</issue>
          ), pp.
          <fpage>49</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schenker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. A. P.</given-names>
            <surname>Goldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Norton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kooji Van Der</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pisano</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. De Waard</surname>
          </string-name>
          , and R. D. Weis, eds.Biblia Hebraica Quinta.
          <article-title>General Introduction and Megilloth</article-title>
          .
          <source>Stuttgart: Deutsche Bibelgesellscha昀琀</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Schomaker</surname>
          </string-name>
          . “
          <article-title>Design considerations for a large-scale image-based text search engine in historical manuscript collections”</article-title>
          .
          <source>In:it - Information Technology 58.2</source>
          (
          <issue>2016</issue>
          ), pp.
          <fpage>80</fpage>
          -
          <lpage>88</lpage>
          . doi:
          <volume>10</volume>
          .1515/itit-2015-0049.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Segal</surname>
          </string-name>
          . “
          <article-title>Methodological Considerations in the Preparation of an Edition of the Hebrew Bible”</article-title>
          . In:
          <article-title>The Text of the Hebrew Bible</article-title>
          and
          <string-name>
            <given-names>Its</given-names>
            <surname>Editions</surname>
          </string-name>
          . Leiden, The Netherlands: Interactive Factory,
          <year>2017</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [22] [25] [26] [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stokes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stökl Ben</surname>
          </string-name>
          <string-name>
            <surname>Ezra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Gargem.</surname>
          </string-name>
          “
          <article-title>The eScriptorium VRE for Manuscript Cultures”</article-title>
          . In: Ancient Manuscripts and Virtual Research Environments, Special issue of Classics 18 (
          <year>2021</year>
          ). Ed. by
          <string-name>
            <given-names>C.</given-names>
            <surname>Clivaz</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Allen</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Stokes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stökl Ben Ezra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          .
          <article-title>EScripta: A New Digital Platform for the Study of Historical Texts</article-title>
          and Writing.
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .34894/bixswx.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>“BiblIA - A General</surname>
          </string-name>
          <article-title>Model for Medieval Hebrew Manuscripts and an Open Annotated Dataset”</article-title>
          .
          <source>In: The 6th International Workshop on Historical Document Imaging and Processing</source>
          . Hip '
          <fpage>21</fpage>
          . New York, NY, USA: Association for Computing Machinery,
          <year>2021</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>doi: 10.1145/3476887</source>
          .3476896.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Toselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>“Digital Editions as Distant Supervision for Layout Analysis of Printed Books”</article-title>
          . In:
          <article-title>Document Analysis</article-title>
          and
          <string-name>
            <surname>Recognition - ICDAR</surname>
          </string-name>
          <year>2021</year>
          .
          <year>2021</year>
          , pp.
          <fpage>462</fpage>
          -
          <lpage>476</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86331-9\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>