=Paper=
{{Paper
|id=Vol-3724/paper6
|storemode=property
|title=A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts
|pdfUrl=https://ceur-ws.org/Vol-3724/paper6.pdf
|volume=Vol-3724
|authors=Christoph Werner,Zacharias Shoukry,Soham Al-Suadi,Frank Krüger
}}
==A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts==
A Corpus of Biblical Names in the Greek New
Testament to Study the Additions, Omissions, and
Variations across Different Manuscripts
Christoph Werner1,∗ , Zacharias Shoukry2 , Soham Al-Suadi2 and Frank Krüger1
1
Hochschule Wismar – University of Applied Sciences, Philipp-Müller-Straße 14, 23966 Wismar, Germany
2
University of Rostock, Universitätsplatz 1, 18055 Rostock, Germany
Abstract
The analysis of textual variants of verses in the New Testament across different manuscripts has mainly
been done by close reading with manual effort. With the increasing number of transcriptions of the
different manuscripts, quantitative analyses (so-called distant reading) can be used to search for patterns
of omission, addition, or other variations, to formulate novel hypotheses to be investigated by close
reading. In this work, we present a corpus of biblical names including spelling variation and inflections
and their mentions in the transcriptions of the New Testament. By integrating and semantically enriching
the data collected from different sources, we established a corpus that can be used for the quantitative
study of omission, addition, and variation of such biblical names. To illustrate the corpus, we implement
some use cases and show that well-known cases can be quantitatively reproduced. The corpus and all
code are published under open licenses to enable reproduction, update, and maintenance.
Keywords
New Testament, Biblical Names, Textual Variation Units
1. Introduction
Research on the editions of the New Testament involves the study of textual variations across
different manuscripts from several centuries and thus reflects the cultural background of such
changes. Besides differences due to small grammatical variations, verses differ in their mention
of biblical characters. For instance, additions, omissions, and other variations of biblical names
can be observed, which are results of copy errors or selection due to cultural, gender, or other
biases. The well known case of Junia(s) and Julia, for instance, where both names are used in
different variations of the same verse, is subject of discussion in the field of textual criticism. The
omission of Damarias in Acts 17:34 of the Codex Bezae is another case, leading to discussions
about general gender biases of the manuscript itself.
With textual criticism, the above conflicting instances have been identified by close reading,
the manual inspection and interpretation of the variations of verses across different manuscripts.
Due to the long-lasting transcription efforts, for instance, by the Institute of New Testamental
Textual Research (INTF) or the International Greek New Testament Project (IGNTP), the base
SemDH 2024: First International Workshop of Semantic Digital Humanities, May 26 or May 27, 2024, Hersonissos, Greece
Envelope-Open christoph.werner@hs-wismar.de (C. Werner)
Orcid 0009-0008-9907-251X (C. Werner); 0000-0002-9784-7034 (Z. Shoukry); 0000-0003-1098-208X (S. Al-Suadi);
0000-0002-7925-3363 (F. Krüger)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
for automatic analyses have been established. In this work, we built upon the transcription
efforts by integrating the textual data from both sources and further semantic enrichment. In
particular, the contributions of this paper are: 1. By integration of different sources, we compiled
a corpus of transcribed verses of the New Testament, which enables automated investigation of
textual variations, 2. we compiled a dictionary of biblical names including their variations, by
including grammatical inflection and other typical variations, 3. Finally, by analyzing omissions
of biblical names in verses across different manuscripts, we illustrate the relevance of the
data and quantitatively reproduce well known findings. In the following, we first give a short
introduction to the New Testament, its history and, the source of verse variation. We then
outline the data collection and processing and describe relevant characteristics of the generated
corpus. Finally, we illustrate how the corpus can be used to generate hypotheses for further
analyses based on closed reading.
2. History of Editions of the New Testament
If you open a Bible today, it is usually divided into two main parts, which have different headings
depending on the edition. Common headings are “Old Testament” and “New Testament”. This
second part is a collection of 27 smaller writings that were most likely all written in Greek in
the 1st–2nd century CE. In the 4th century, Jerome and others began to collect various Latin
translations and compile a uniform, revised text from them, which resulted in the Vulgate,
which became increasingly standardized and established over the course of the Middle Ages
and was finally declared the authentic text by the Catholic Church in the 16th century. Erasmus
of Rotterdam, who was guided by the humanist ideal “Ad fontes–To the sources”, was also
active during this period. He was not satisfied with a Latin translation, but wanted to return to
the older Greek tradition. The problem, then as now, is that the autographs, i.e. the original
papyri on which the New Testament texts were written, no longer exist. If we take all the Greek
manuscripts known today from the 2nd–19th centuries together, we have around 5700,1 and
this figure does not include the thousands of manuscripts of translations into Latin, Coptic,
Syriac, etc., not to mention the manuscripts of early church authors with biblical quotations,
which are also considered textual witnesses. So we have all kinds of versions of the same texts,
which naturally lead to variants that differ from one another. Almost all text-critical editions
from the 16th–20th centuries were compiled by comparing manuscripts individually and noting
the deviations by hand.
The New Testament (NT) manuscripts are classified into four distinct categories: papyri,
majuscules, minuscules, and lectionaries. Papyri, being the oldest witnesses of the NT, often
exist in fragmented form (see Figure 1). Majuscules, also referred to as biblical uncials, are
characterized by their use of majuscule letters, which feature minimal ascenders and descen-
ders. In contrast, minuscules are written in a small, cursive Greek script. Lectionaries can be
encountered in both majuscule and minuscule Greek lettering styles. The Institute for New
Testament Textual Research (INTF) at the University of Münster has cataloged all currently
known manuscripts to the best of their ability in their New Testament Virtual Manuscript Room
(NTVMR).
1
This is an estimate from September 2023 by [1]
Figure 1: Fragmentation of papyrus P21 [2]
Various numbering schemes are employed for the above mentioned manuscripts.
With the Gregory-Aland Scheme (the de facto standard for biblical manuscript referenc-
ing), IDs differ depending on the manuscript type. Papyrus manuscripts are denoted by a
Gothic/Black-letter P followed by a superscript number (e.g., 𝔓52 ), often simplified to P52 for
ease of display. Majuscules are identified by a leading zero followed by an incremental number
(e.g., 0166). Minuscules are designated by an incremental document number alone. Lectionaries
are indicated by a leading ℓ followed by an incremental number (e.g., ℓ2005). A capital L is often
used in place of ℓ (e.g., L2005) due to display limitations.
The INTF Scheme uses a different approach. Instead of preceding letters (𝔓, ℓ, or 0), it
combines the document number with a leading digit indicating the manuscript type (1 for
papyrus, 2 for majuscule, 3 for minuscule, and 4 for lectionary). Padding zeros are inserted
between the leading digit and the document number – the document number is identical to its
corresponding document number in the Gregory-Aland Scheme – to form a five-digit Document
Identifier Number (docID). For instance, the Gregory-Aland notation 𝔓52 is equivalent to
10052, indicating a papyrus manuscript. Similarly, the number 0166 is transformed into 20166,
representing a Majuscule manuscript. In the case of the Gregory-Aland noted manuscript 365,
it corresponds to 30365 in the INTF notation, denoting a minuscule manuscript. Lastly, ℓ2005 is
the same as 42005, indicating a lectionary manuscript.
In this paper, the Gregory-Aland scheme is applied when mentioning or referencing
manuscripts.
3. Data Collection and Processing
3.1. NTVMR Data
The New Testament Virtual Manuscript Room (NTVMR), managed by the INTF, offers an API2
from which we retrieved docIDs of interest.
Cataloguing of the manuscripts has presumably been completed, but the numbering and its
correction [1] is still the subject of discussion. The currently catalogued number of manuscripts,
the docID ranges in use, next to the number of duplicates and merges forming the total known
number of manuscripts are shown by manuscript category in Table 1. As one can see in Table 2
imaging, indexing, and transcribing is still a task in progress. Most progress in percentage terms
has so far been made with the papyri, followed by the majuscules, minuscules, and lectionaries.
Removed/
Manuscript Type Catalogued Ranges of docIDs in use Total
Combined
Papyri 142 10001–10142 6 136
Majuscules 332 20001–20326, 29994–29999 42 290
Minuscules 3,060 30001–33020, 39960–39999 159 2,901
Lectionaries 2,633 40001–42556, 49920–49975, 49979–49999 135 2,498
Table 1
Number of manuscripts catalogued by the INTF (as of 2024-03-05)
The API provides endpoints that can be used to access transcriptions3 and metadata4 for a
specific manuscript respectively. Both request types require a docID to obtain a certain TEI
XML file with transcription data or JSON file with metadata off the NTVMR server.
3.2. IGNTP Data
The International Greek New Testament Project (IGNTP) provides full transcriptions across
selected manuscripts for John, Galatians, and Ephesians, whereas the transcriptions of Phillippi-
ans and 1 Corinthians are marked as ‘in progress’. This data is accessible via direct downloads
[3, 4] of zipped XML files. In addition to the transcriptions, the TEI files provided by the IGNTP
also contain data on the respective manuscripts.
3.3. List of Names
As no machine read- and processable list of names in the Greek New Testament exists, it has
been created in a largely manual and iterative process described in the following.
An initial compilation of biblical names from the New Testament was gathered from FactGrid.
We sought data on all individuals mentioned in any of the books of the NT, resulting in a list of
305 biblical characters. It must be emphasized that biblical characters may share identical names,
such as Mary of Bethany and Mary Magdalene. Given the theological debate surrounding such
2
https://ntvmr.uni-muenster.de/community/vmr/api/metadata/liste/get/
3
https://ntvmr.uni-muenster.de/community/vmr/api/transcript/get/?docID=&pageID=ALL&format=teiraw
4
https://ntvmr.uni-muenster.de/community/vmr/api/metadata/manuscript/get/?docID=&format=json
Manuscript Type Catalogued Imaged Indexed Transcribed
count % count % count % count %
Papyri 1,351 100 1,318 97.56 1,280 94.74 1,290 95.48
Majuscules 26,812 100 25,836 96.36 22,921 85.49 6,902 25.74
Minuscules 1,318,117 100 1,229,597 93.28 349,604 26.52 44,808 3.40
Lectionaries 802,998 100 412,644 51.39 20,570 2.56 3,315 0.41
Total 2,149,278 1,669,395 394,375 56,315
Table 2
Statistics on processed manuscript pages by the NTVMR (data from API request to https://ntvmr.
uni-muenster.de/community/vmr/api/statistics/pages/ as of 2024-03-05)
Manuscript Type Number of Removed/ Total Number of Verses (by publisher)
Manuscripts Combined IGNTP INTF ITSEE
Papyri 48 0 48 39 0 1,889
Majuscules 105 0 105 8,597 3002 24,324
Minuscules 348 2 346 103,236 0 82,948
Lectionaries 43 0 43 8,996 0 27,990
Table 3
Number of Manuscripts and Verses by Manuscript Type in the IGNTP Corpus (as of 2024-03-05)
potential identity overlap of certain biblical individuals, we opt to consolidate characters sharing
the same name and subsequently refer exclusively to biblical names. The list of biblical names
was compared to and expanded with information from [5], resulting in a total of 319 biblical
names. Variations in grammatical cases were sourced from the Louw-Nida lexicon [6]. The
subsequent search (described in section 3.6) led to an iterative refinement of alternative spellings,
as we checked the verses marked as ‘missing a name’ for spelling variants of given name.
To facilitate later searches for names and their variations, it is imperative to compile a
comprehensive list of all known spelling variations associated with each individual. This
involves consolidating spelling variation of all grammatical cases into a list, as well as removing
diacritics and transforming list entries into lowercase characters. The resulting list of lists
encapsulates the diverse variants of individuals’ names.
3.4. Parsing TEI Files for Transcription Data
Prior to parsing the previously acquired TEI files, a validity check is conducted. Through this
process, a total of 41 of 1617 files (2.5%) are identified as invalid XML and do get excluded
from parsing, and consequently from subsequent analysis. This is done to ensure a reasonable
automation of the parsing process. Various factors contribute to the invalidity of XML files,
such as discrepancies between opening and closing tags (n=7, 17%), undefined entities (n=10,
24%), duplication of attributes (n=1, 2.4%), junk after document element (n=1, 2.4%), as well as
syntax errors resulting from invalid attribute names and/or values (n=22, 54%). It is pertinent to
note that only files originating from the INTF exhibit non-valid status.
In essence, a TEI-XML file comprises a TEI header containing metadata about the document
and a text block containing transcription data. TEI markup [7] is utilized to represent the
structural and semantic elements of the text, such as paragraphs, headings, lists, and quotations,
using XML tags. For instance, tags delineate divisions like books or chapters,
tags represent verses, and tags denote words. Some words or entire parts may be unclear
or missing, which are marked using tags for ambiguous portions and tags
for missing parts. Additionally, tags indicate where known content has been
inserted instead of setting a tag.
The transcription data follows a hierarchical structure based on the folio5 , organized by book,
chapter, verse, and word. Since some verses span multiple pages, this hierarchy may be repeated
several times within a single transcription document. Additionally, the same verse can appear
multiple times on different folios of a document, particularly in lectionaries.
When processing the data verse by verse, we first link all related tags, where the values
of the ‘part’ attribute (‘I’ for initial and ‘F’ for final) are decisive. If these attribute values occur
in consecutive tags, these are to be combined. Otherwise, tags with the same
‘name’ attribute value are treated as individual verse transcriptions.
The verse blocks generated in this way are then searched for tags and a list of the
tags found is created. This list is then parsed for text, resulting in a string representing the
transcription. This string is then stripped of diacritics and formatted as lowercase letters.
As it later could be of importance which parts of the transcription have been marked as
supplied or unclear, we generate a string of the same structure as the transcription string. In
this string all characters have an initial value of ‘c’ (clear). By checking against the previously
produced list of the tags we are able to set unclear character values to ‘u’ and supplied
characters to ‘s’. For example: the text string extracted from Listing 1 is seen in (1), the string
which indicates the readability of the letters during transcription is seen in (2).
αμμιναδαβ δε εγεννησεν (1)
uuusuussc cc ccccccccu (2)
1 ...
2
3 αμ
4
5 μ
6 ι
7 να
8 δα β
9 δε
10 εγεννησεν
11 ...
Listing 1: Example for clear, unclear, and supplied characters in the transcription of P1 by INTF
tags are not taken into account during parsing, as we are solely interested in the
transcribed text. But for the sake of completeness those gaps should find their way into a later
version of the data.
5
In this context a folio is a manuscript page
Column Description Format Example data
Document Identifier by
ga String P1
Gregory Aland Scheme
Verse Identifier by
bkv String B01K1V1
BKV Scheme
Verse Identifier by
nkv String Matt.1.1
NKV Scheme
text Transcription text String βιβλος γενεσεως ιυ χυ υυ δαυιδ υιου αβρααμ
Marking of clear,
marks unclear and supplied String cccccc cccccccc cc cc cc ccccc ssss cccccc
characters
publisher Transcription publisher String The Institut für neutestamentliche Textforschung
Download source
source String ntvmr
of transcription
Table 4
Data Dictionary for Verses Collection
Extracted text and readability marks are saved alongside their docID and GA number, source,
and publisher and verse identifier. It is to mention that ‘source’ describes the download source
(either ‘igntp’ or ‘ntvmr’).
IGNTP and INTF use different forms of verse identifiers. The IGNTP bases its nomenclature
on [8], but all separators are replaced by dots and spaces are removed (e.g., ‘1 Cor 1:3’ becomes
‘1Cor.1.3’). The INTF instead assigns an ascending alphanumeric identifier to each book starting
with B01 for the Gospel of Matthew and ending with B27 for the Book of Revelation. Whereby
the numbering follows the listing of Books of the New Testament in [8] and similarly, chapters
within a book are numbered with K and their verses with V (e.g., ‘1 Cor 1:3’ becomes ‘B07K1V3’).
These verse identifiers are referenced below as BKV (INTF scheme) and NKV (IGNTP scheme).
Since one of the two is always present, we are able to derive the other and save it alongside.
All data extracted and generated during TEI parsing is saved to a data frame with the format
depicted in Table 4.
3.5. Manuscript Metadata
During the parsing process of TEI files sourced from IGNTP and NTVMR, we successfully
extracted essential metadata such as the GA number, docID, and occasionally a manuscript label.
Further augmentation of this dataset was achieved through the incorporation of JSON files
housing comprehensive metadata for each manuscript within the NTVMR. This supplementary
information encompasses details such as docID, GA number (utilized for verse linkage), specifics
on the location of storage (shelf instances), estimated period of origin, dimensions (both width
and height), as well as counts of leaves, pages, columns, and lines. Notably, each page of a
manuscript is accompanied by pertinent data regarding indexed content (verses on a page),
hyperlinks to transcriptions and images, and indications of image protection necessitating
NTVMR expert account authentication for viewing.
To further enrich our dataset with publicly accessible information, we conduct a query
on dbpedia to acquire additional manuscript data. The query yields results comprising URIs,
Column Description Format Example data
Document Identifier by
docID Integer 30461
INTF scheme
Document Identifier by
ga String 461
Gregory Aland Scheme
Century in Roman letters
century (with some exceptions String 835
being numeric)
pagesCount Number of pages String 688
leavesCount Number of leaves String 344
dbpedia Link to dbpedia String http://dbpedia.org/resource/Uspenski_Gospels
label manuscript name String Uspenski Gospels
source Sources of data String ntvmr
Table 5
Data Dictionary for Manuscript Metadata Collection
manuscript labels, manuscript types and numbers, and temporal and spatial origins and/or
discoverers of the manuscripts.
Upon scrutinizing the retrieved data, it became evident that manuscript numbers exhibit varia-
tion, being represented in distinct formats such as "𝔓48"@en, "'''𝔓24"@en, "ℓ2137"@en,
and "ℓ 2144"@en or occasionally are presented solely as numeric values which do not give
any clue on the type of manuscript. However, based on the RDF property ‘form’, the manuscript
type is given as papyrus, uncial, minuscule, or lectionary.
To ensure uniformity and facilitate seamless data integration based on the respective GA
number, a cleanup process was necessary. This involved removing all non-numeric characters
from the manuscript number string. Subsequently, the remaining numerical value was con-
catenated with an initial character determined by the RDF property ‘form’, adhering to the GA
notation convention.
After merging the different manuscript data sources we get a csv file with the columns
described in Table 5
3.6. Search for Names
The main task involves the identification of occurrences and subsequent detection of omissions
within the verses dataset built during TEI parsing.
Therefore we take the previously processed names, add a unique numeric nameID per name
(for ease of later use), explode the list of variations and add a unique numeric variantID to each
generated entry of variants (see Table 6).
After retrieving the unique BKV verse identifiers from the list of verse transcriptions (see
Table 4 for its data dict), the process of searching is parallelized in that manner that all name
variants are searched on a BKV-by-BKV basis.
This parallelized approach involves filtering a copy of all verse transcriptions for entries
corresponding to a given BKV verse identifier. Subsequently, each transcription text field
is scanned for all name variants. Upon identification of a variant, the associated nameID is
Column Description Format Example data
label:en Name in English String Aaron, brother of Mose
gender Genus of the person String m
label:el Name in Greek String ααρων
factgrid FactGrid ItemID String Q165847
variant Spelling variant String ααρωνος
wordID Unique word identifier Integer 4
variantID Unique variant identifier Integer 7
Table 6
Data Dictionary for Name Collection
Column Description Format Example data
ga Document Identifier by GA scheme String P1
bkv Verse Identifier by BKV Scheme String B01K1V1
text Transcription text String βιβλος γενεσεως ιυ χυ υυ δαυιδ υιου αβρααμ
wordID Unique word identifier Integer 23
variantID Unique variant identifier Integer 77
occurrence Indicator of occurrence Boolean True
Table 7
Data Dictionary for Name Occurrences Collection
appended to a set specific to that verse transcription (denoted as ‘found’), representing all
detected names within. Additionally, a BKV-specific set (denoted as ‘occurrences’) is updated to
include all names detected in any text associated with the given BKV. By comparing the ‘found’
set against the ‘occurrences’ set for the respective BKV, we can ascertain the names missing
from each transcription.
Afterwards, the found and missing sets do get exploded separately for each verse transcription.
With this, the data frame now contains rows with information on a certain verse and the
occurrence or omission of one specific name in it. The corresponding data dict for the described
data frame is given in Table 7.
4. Analysis of the Transcription Corpus
4.1. Transcription Overview
At first glance at the transcription corpus, we can see that the INTF has the most transcribed
verses on papyri, minuscules, and majuscules as well as in total. Followed by the Institute for
Textual Scholarship and Electronic Editing (ITSEE) and IGNTP.
As a result of the multi-source character of our transcription corpus, which incorporates
transcriptions from IGNTP and INTF, there are duplicates of verse transcriptions to be expected.
When considering duplicates as entries with identical docID/bkv combinations, we identify
3817 instances. However, to assess whether these duplicates are also identical in the transcribed
text, we examine entries with identical docID/bkv/text combinations, revealing 3624 duplicates.
1.00
0.75
Relative Frequency
0.50
0.25
0.00
um
sli
ion
n
ar
s
on
e
rah
e
l
t
s
l
him
im
e
am
th
ul
a
au
ne
ry
us
as
a
as
tus
s
a
us
ris
Mn s
ia
n
ius
nie
be
Lo
icu
su
n
na
nic
Ab
ss
Eli
r th
ca
isc
eo
ud
Ru
Pa
He
az
Ma
em
hra
log
om
rm
ce
as
Es
ale
ma
Ga
rod
Ise
ile
Na
ath
Ze
Da
Je
Je
Ac
Me
eis
Sim
ha
Eu
Cla
Ma
Pr
Ele
es
He
Ph
ilo
od
gd
Th
Ep
Da
He
Jo
Ac
Pr
Cr
Ph
Nic
Ma
Biblical Character
Figure 2: Relative occurrence frequency of a selected subset of biblical characters across all verses. The
red line depicts the median (excluding 0s). Characters are ordered by the median relative occurrence
frequency.
Duplicate transcriptions remain in the corpus for the sake of completeness of the collection
of transcriptions.
4.2. Analysis of Additions, Omissions, and Variations of Names
To illustrate the value of the corpus presented in this work, in the following, we illustrate some
use cases. To this end, we first analyze omissions of names, by computing the relative frequency
of the occurrence of a biblical character across all variations of a verse. Figure 2 depicts a subset
of biblical characters including their relative occurrence frequency within different verses. For
each verse and character, the frequency was determined by the ratio of manuscripts where
the character was included in a verse and the overall number of manuscripts that actually
contain this verse. Frequencies of 0.0 were left out, as they represent verses where the particular
character was never included. From the figure, several observations can be made, some of which
are summarized in the following.
Firstly, for Esau one verse with a relative occurrence frequency of 1 is evident, indicating that
it is included in all variations of this verse. Secondly, Eleazar has one high (106/117 = 91%) and
one low (1/16 = 6%) relative occurrence frequency reflecting different omission respectively
variation pattern. While different other patterns can be observed from the figure that suggest
omissions, additions, or variations, a closer look at particular verses is necessary to draw reliable
conclusions. Figure 3 illustrates the occurrence of different biblical names across different
variations of a particular verse. In the following, these patterns are analyzed more closely.
Biblical Character
Preiscas
Prisca
Eleazar
Mary
Martha
Mary
Martha
Mary
Jesus
John
Damaris
Paul
Preiscas
Prisca
01 01 01 01 01 01 01 01 01
below.
010 011 011 011 011
017 013 013 013 0120 0142
012 0141
0141 0141 0141 0121
019 017 0142 02
0150 017 017 017
024 019 020 0150
02 019 019 019 02
02 02 02
020 03 02 03
021 021 021 020
025 021 020
032 0211 0211 0211 0211 03 04
03 022 0233 022
038 0233 044 0243
0319 028 028 0233 044
028
04 04 03 03 03 024 049
049 03
044 041 030 030 030 028
03 05 05
032 032 032 04
049 042 030
033 033 033 08 08
06 034 031
1 034 034 097 044
1 036 036 032S 1
1043 036 037 033
103 037 103 1
037 038 034 049
104 1047 038 1003
038 039 039 036 104
1069 1063 039 037 075
041 04 103
1108 044 038 1127
1093 041 041
1115 044 045 039 1162 104 1
044
1175 1110 047 04
045 045 1127
05 041 1243 104
1241 118 05 047
07 044 1162
1243 050 05 1251
1190 09 045 1069
07 07 1243
1297 1 047 1270
1192 1 1
1319 1009 063 1251 1108
1194 1009 1009 1292
1359 1010 07
1010 1270
1010 1014 083 1297
1505 124 1014 1175
1014 1029 09 1292
1524 1273 1029 1 1359
1029 1071 1071 1297 1241
1573 1071 1079 1009 1448
1279 1079
1611 109 1010 1359
1079 109 1490 1319
1289 1093 1014
1617 109 1093 1409
1128 1029 1501
1718 130 1093 1128 1505
118 1071 1448
1729 1414 1128 118 1079 1509
1192
118S 1192 109 1490 1524
1739 1421 1210 1563
1192 1210 1093
1751 1219 1501
1424 1210 1219 1128 1595 1563
1798 1230
1230 118 1509
1446 1219 124 1609
18 124 1192
1230 1241 1563 1573
1834 1455 1241 1210 1611
124 1242
1278 1242 1219 1595
1837 150 1241 1611
1278 1230 1678
1838 1242 1293 1609
1502 13 1293 124
1846 1729 1617
1278 13 1241 1611
1528 131
1852 13 1319 131 1242 1735
1319 1678 1739
1875 1534 131 1320 1253
1751
1319 1321 1320 1278 1729
1877 157 1751
1320 1344 1321 1293 181
1881 1344 13 1735
1574 1321 138
1886 138 131 1827 181
138 1424 1751
1579 1424 1319
1893 1463 1831
1424 1320 181
1908 1582 1546 1463 1837
1463 1546 1321 1832
1909 16 1561 1827
1546 1561 1344 1875
1912 157 1837 1831
1604 1561 1571S 157 1424
1935 157 1571S 1463 1842 1832 1877
163 1582
1942 1571S2 1654 1582 1546
1852 1837
1950 1661 1582 168 1654 1561
1881
1654 1689 168 157 1874 1838
1959 1689 1571
168 173 1689 1875 1912
1961 1780 173 1582 1842
1689 1788
1962 1797 1788 1654 1884 1852
184 1788 168 1945
1963 18 1797
191 1797 1689 1890 1874
1969 205 18
18 173 1961
1973 205 209 205 206 1875
205 209 1788
2106 2138
1985 209 209 2106 1797 1884 1962
213
1987 2106 213 18
2193 2192 2147 1890
1991 213S 2192 205 1963
2193
230 2193 209 218 206
1995 2192 22
22 2106 1973
1996 233 2193 2223 2200 2138
213
22 226 2223
1999 2597 2192 2243 2147 1985
2223 2372 226
2000 2193
2680 2411 2372 228 218
226 22
2012 249 2411 1999
2726 2411 2223 2298 2200
206 2561 249
2561 226
2615 2561 2002
Matt.1.15
273 2374 2243
Rom.16.3
John.11.5
John.1.38
2102 2372
Acts.17.34
Acts.13.44
2575
John.20.15
John.11.45
1.Cor.16.19
265 2575
2105 2737 265 2411 2412 228
2680 2585 2004
2110 2561
2786 2680 2713 2615 2298
2575 2495
2127 2713 2718 265 2012
279 2585 2374
218 2718 2766 2680 254
2615
2831 2766 2768 2713 2412
2197 265 2652 2127
2768 2786 2718
2200 2886 2680
2790 2768 2718 2495
2786 2713 218
2344 295 2886 2786
2886 2718 2774 254
2352 295 2886 2766
310 295 317 2652 2200
2400 295 2768 2805
317 33 317
2495 333 2786 2718
33 333 33 307 2352
2886
Manuscript (GA Number)
2516 335 333 346 2774
333 295 319
2523 346 346 35 346 2400
33 2805
2544 35 357 35 321
348 333
256 357 377 357 346 307 2495
326
35 377 382 377 35
2576 397 330 319
397 382 357 2523
263 372 423 321
423 397 377 35
2659 38 543 423 382 263
543 544 326
2685 544 397 365
4 544 565
2853 579 423 330 2659
565 579 378
423 597 430
2936 579 597 35
69 543 383 2853
3 509 597 69 732 544 365
326 517 69 732 398
788 565 2892
788 378
33 543 732 792 579 424
792 799 597 383
330 788 326
544 799 69 429
35 792 807
807 398
555 799 821 732 43
363 821 788 33
807 826 424
365 565 826 792 431
821 828
828 799 429 330
38 579 841 436
826 841 807
398 865 43
59 828 865 821 441 35
400 869
841 869 826 431
61 872 453
421 865 872 828 365
884 436
424 67 884 884 841 459
892S
892S 865 441 38
43 700 892S 992 467
983 872
436 983 994 453
709 992 892 5
441 992 L1000 398
994 983 459
723 994 L1000 L1073 6
451 992
L1000 L1073 L1075 467 424
455 738 L1076 994S 607
L1073 L1075 L1000 5
box indicates the inclusion of the name in the verse within a particular manuscript.
459 788 L1077 436
L1075 L1076 L1073 61
467 L1077 L1082 6
792 L1076 L1075
5 L1082 L1086 610 442
4.3. Examples of Additions, Omissions, and Variations of Names
L1077 L1076 607
822 L1086 L1091
6 L1082 L1086 617
L1091 L1096 61 451
608 826 L1086 L1091
L1096 L1100 621
L1096 610
61 828 L1091 L1100 L141 459
L141 623
621 L1096 L141 L1552 617
842 L1552
623 L141 L1552 L1692 L1692 629 621 460
892 L1552 L1692 L17 L17
629 630
L1692 L17 L252 L252 623
630 895 467
L17 L252 L253 L253 636
69 930 L32 629
L252 L253 L32 5
81 L329 642 630
954 L253 L32 L335
88 L329 L335 L387 665
983 L32 L387 636 6
886 L335 L425
L329 L387 L425 69 642
90 L211 L5
L335 L425 L5 606
L60 876 665
915 L2211 L387 L5 L60 L638
L1159 L425 L60 L638 L640 88 876 608
L387
L1178 L60 L638 L640 L663
L547 93 88
L638 L640 L663 L704 630
L1188
L563 L640 L663 L704 L735 94 93
L1440 L704 L735 81
L663 L770 945
L169 L60 L735 L770 94
L704 L847
L2010 L770 L770 L847 P119 L1188 945 915
L735 P45
L2058 L847 P120
L773 L770 P59 L60 L1188
P45 P5 917
L587 L847
L844 P6 P6 P55 P41 L60
L809 P5 P66 P66 P66
P1 P74 P74 P46
P46 P66 P75 P75 P75
their eyes, because exactly the same verb form and article (homeoteleuton) appears between
2597. This verse is part of a genealogy in which copyists have probably slipped in the line with
locate yet another omission in Matt 1:15, where Eleazar (ελεαζαρ) is skipped in the minuscule
into account in textual criticism: John is written here instead of Jesus. Further, we were able to
variation in the minuscule 2575 in John 1:38 which has, to our knowledge, not yet been taken
is found in minuscule 841, which speaks of both Mary and Martha. We also found a striking
213, noted in the current preliminary online ECM of John) in John 11:45. Yet another variant
the publication of NA28. We can also confirm the variation of Mary and Martha (attested in
Koridethi) are interchanged in John 11:5, which has been examined before [12] and after [13]
finding is the simultaneous occurrence of Prisca and Preiscas in 03, as corrections were made
“the brother”). We were also able to locate the already known and in [11] discussed variant
a copyist’s error, since the immediate context does not allow for a feminine interpretation
from NA28 and ECM. In minuscule 1 we have found a potentially feminine variant of the name
other hand, the woman Damaris is omitted from Codex Bezae in Acts 17:34, which is also evident
consistent with both, the current text-critical hand editions NA28 [9] and ECM [10]. On the
We found Paul to be inserted in Acts 13:44 by Codex Bezae (majuscule 05). This finding is
During the iterative search and the follow-up analysis of name variations, we found some
Figure 3: Occurrences of different biblical names across different variations of the same verse. A blue
of the forms Prisca and Preiscas in Rom 16:3 (P46 and 03) and 1 Cor 16:19 (P46). Another
in a manuscript. We can confirm that Martha (e.g., P66) and Mary (e.g., 038 resp. Θ, Codex
(the apposition is τον αδελφον, which is a grammatically clearly masculine word, namely
Epaphroditus in Phil 2:25 namely επαφροδιτα. The mentioned name επαφροδιτα is probably
known and possible unknown examples of omissions, additions, and variations which we show
father and child each time (“Eliud begat Eleazar. Eleazar begat Mattan. Mattan begat Jacob”)
like it can be seen in Figure 4. A curious case is the addition of Mary in Jesus’s direct speech in
John 20:15 (minuscule 2106). Here Jesus addresses Mary by her name, which is not evident in
any other manuscript off our dataset. On further inspection, this addition might have happened
as another case of one copyist’s slipping in the line of text, as the direct speech is introduced in
both verses with exactly the same words (“Jesus said to her”).
ελιουδ ελιουδ
δε εγεννησε τον δε εγεννησε τον
ελεαζαρ ελεαζαρ ⋮
δε εγεννησε τον ⋮
ματθαν ματθαν ματθαν ματθαν
δε εγεννησε τον ια δε εγεννησε τον
κωβ ιακωβ
(a) Handwriting of Matt 1:15 in (b) Transcription of Matt 1:15 in (c) Transcription of Matt 1:15 in
Majuscule 045 [14] Majuscule 045 Minuscule 2597
Figure 4: Depiction of how a line slip due to a homeoteleuton could have happened: line 2 and line 4
are identical
5. Related Work
A work similar to ours is [15], which presents a registry of Hebrew names and an analysis of
name occurrences within the lists found in the Torah book of Ezra–Nehemiah. Subsequently,
[16] utilized this registry to establish the “Ancient Hebrew Personal Names” database.
We have found a deficiency in the accessibility of a thorough, machine-readable, or queryable
compilation of names found within the Greek New Testament. Although efforts, such as those by
FactGrid, have been made to compile such lists, they often lack completeness, Greek spelling of
names, variations in name spelling, or comprehensive coverage across manuscripts. Our dataset
is positioned to complement existing initiatives with entries addressing these shortcomings.
Furthermore, discussions have emerged regarding the downplay of females [17], debates
concerning the gender attribution of certain names [18] [19], and inquiries into the textual
traditions containing additions such as Martha of Bethany [13]. These discussions highlight
the complexities surrounding omissions, additions, and variations in name usage, warranting
further scholarly attention, in which our data can be of use.
When it comes to the analysis of textual variation in revision histories, some work has been
done in the context of Wikipedia, where the main focus was the identification of vandalism
and biased statements based on information about the corresponding editor. For instance, [20]
identifies different revision patterns on a set of almost 7000 Wikipedia article revisions.
6. Limitations
The transcription process is not yet automated and will probably remain largely manual work
in the future. This makes it all the more important that transcribers adhere to certain rules
and guidelines like [21] and [22] to maintain conformity and reusability, as well as guarantee
completeness and accuracy. However, precisely this reusability is currently a problem, as
different transcribers do not fully adhere to the above guidelines, and the guideline version used
is not mentioned in the transcript files. As a result, for example, it is often not indicated from
which source a ‘supplied’ letter originates, or why a gap occurs in the text.
As we conduct a string search for names on the transcription corpus, we get a certain amount
of false positives, which result in falsely negative entries in the list of occurrences. For example,
there is the accusative form of Zeus (‘δια’) which generates a lot of such entries, as it is also
a preposition (in English: via, by, for, into, over, to). One could now argue that certain verses
which show this should be excluded from the name search. However, we want to include all
possible changes, including additions/omissions/variations that may only occur in one verse.
For this reason, we keep these entries in the dataset for later analysis and removal.
While the string search does not allow the disambiguation of different (or the same) persons
in general, the corpus described in this paper, enables establishing hypotheses about the usage
of different names for the same person and the subsequent quantitative analysis of mention
patterns, for instance, by correlation. This, for example, is the case on Prisca and Preiscas, as
there are discussions on a certain spelling being both feminine and masculine and whether the
supposedly masculine form of the name could be just another spelling variation of the feminine
form.
7. Conclusions and Future Work
In this paper, we present a novel data set which was collected by manually integrating and
semantically enriching different public data sources for the study of omissions, additions, and
variations of biblical names in the Greek New Testament. To this end, we illustrate the diverse
origins of transcriptions pertaining to Ancient Greek New Testament manuscripts and describe
the process of compiling transcriptions in detail. With the presented corpus of transcriptions
and occurrences of names, a step has been taken towards the automatic creation of hypotheses
for textual criticism. For instance, by correlating patterns of occurrences of names of apparently
different characters, the established corpus allows to investigate name variations exceeding
plain grammatical variants. We were able to show already known additions, omissions, and
variations of name occurrences in our data based on well known examples from the literature.
Moreover, we discovered a not yet discussed case of an addition of Mary in John 20:15.
While in its current form, the corpus can be used for the illustrated analyses, we are aware
of some limitations, including false positives resulting from similarities between names and
prepositions. To address this issue we plan to utilize methods of machine learning such as:
Part Of Speech (POS) Tagging and Named Entity Recognition (NER). This requires a manual
annotation for training and evaluation, but also needs particular attention to the drawbacks
of such methods. To make the data semantically meaningful, accessible, and explorable for
further research, we plan to create a knowledge graph from the provided dataset for the FAIR
publication. However, to the best of our knowledge, currently there are no ontologies for the
semantic description of biblical names and characters in the New Testament. To this end, we
plan to develop an appropriate ontology.
Used Software, Data, and Code Repositories
We have used Python (v3.12.1) and R (v4.3.1) for retrieving, processing, analyzing, and plotting
data. Notable packages in use are: beautifulsoup4 (v4.12.3), jupyter (v1.0.0), notebook (v7.0.6),
pandas (v2.1.4), sparqlwrapper (v2.0.0), as well as tidyverse (v2.0.0).
Data generated during this project is made available [23] on Zenodo. The Sourcecode of this
project is published on GitHub6 .
Acknowledgments
We would like to thank Jan Krans-Plaisier and Peter-Ben Smit for their help in familiarising us
with the topic and for their insights. Additionally, we thank Corinna Stratmann for her help
in browsing dictionaries in search of relevant entries. This work was funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research Foundation) 513300936.
References
[1] K. Leggett, G. S. Paulson, How Many Greek New Testament Manuscripts Are There
REALLY? The Latest Numbers, 2023. URL: https://ntvmr.uni-muenster.de/intfblog/-/blogs/
how-many-greek-new-testament-manuscripts-are-there-really-the-latest-numbers.
[2] Special Collections and Archives, Trexler Library. Muhlenberg College, P. Oxy. 1227: St.
Matthew’s gospel, xii., online, 2015. URL: https://library.artstor.org/#/asset/SS7730556_
7730556_9313349, accessed 2024-03-11.
[3] The Principio Project, The International Greek New Testament Project, Papyri, Majuscules,
Minuscules, and Lectionaries of John, online, 2024. URL: https://itseeweb.cal.bham.ac.uk/
iohannes/transcriptions/, accessed 2024-03-04.
[4] Institute for Textual Scholarship and Electronic Editing Birmingham, Electronic Resources
for the Textual Tradition of the Epistles of Paul, online, 2023/2024. URL: https://itseeweb.
cal.bham.ac.uk/epistulae/, accessed 2024-03-04.
[5] W. Bauer, Griechisch-deutsches Wörterbuch zu den Schriften des Neuen Testaments und
der frühchristlichen Literatur, 6., völlig neu bearbeitet auflage ed., Walter de Gruyter, Berlin,
2012. Frühere Auflage unter dem Titel: Bauer, Walter: Griechisch-deutsches Wörterbuch
zu den Schriften des Neuen Testaments und der übrigen urchristlichen Literatur.
[6] J. P. Louw, E. A. Nida, Greek-English Lexicon of the New Testament, volume 1, United
Bible Societies, New York, 1988.
6
https://github.com/chr-werner/SemDH2024-GreekNewTestamentNames
[7] Text Encoding Initiative, TEI: Guidelines for Electronic Text Encoding and Interchange, P5
Version 4.7.0., revision e5dd73ed0, online, 2023. URL: https://www.tei-c.org/release/doc/
tei-p5-doc/en/html/index.html, accessed 2024-03-04.
[8] J. F. K. Billie Jean Collins, Bob Buller, The SBL Handbook of Style, SBL Press, 2014. doi:10.
2307/j.ctt14bs6ct.
[9] B. Aland, K. Aland, J. Karavidopoulos, C. M. Martini, B. M. Metzger (Eds.), Novum Testa-
mentum Graece, 28 ed., Deutsche Bibelgesellschaft, Stuttgart, 2012.
[10] H. Strutwolf, G. Gäbel, A. Hüffmeier, G. Mink, K. Wachtel (Eds.), Die Apostelgeschichte:
The Acts of the Apostles: Novum Testamentum Graecum: Editio Critica Maior, Deutsche
Bibelgesellschaft, Stuttgart, 2017.
[11] D. A. Kurek-Chomycz, Is There an “Anti-Priscan” Tendency in the Manuscripts? Some
Textual Problems with Prisca and Aquila, Journal of Biblical Literature 125 (2006) 107.
doi:10.2307/27638349.
[12] K. von Tischendorf, Novum Testamentum Graece: Ad Antiquissimos Testes Denuo Recen-
suit Apparatum Criticum Omni Studio Perfectum Apposuit Commentationem Isagogocam,
volume I of Editio Octava Critica Maior, Giesecke & Devrient, Leipzig, 1869.
[13] E. Schrader, Was Martha of Bethany Added to the Fourth Gospel in the Second Century?,
Harvard Theological Review 110 (2017) 360–392. doi:10.1017/s0017816016000213.
[14] Library of Congress, Collection of Manuscripts – Monastery of Dionysios 55. (old 10).
(Greg. 045, Ω). Four Gospels. 8th/9th cent. 259 f., 2024. URL: https://www.loc.gov/resource/
amedmonastery.00271050008-ma/?sp=12&r=0.51,0.18,0.177,0.153,0, accessed 2024-03-04.
[15] A. Frank (Ed.), Asaf, Juda, Hatifa - Namen und Namensträger in Esra/Nehemia, number 78
in Stuttgarter Biblische Beiträge (SBB), Verlag Katholisches Bibelwerk, Stuttgart, 2020.
[16] A. Frank, H. Rechenmacher, Morphologie, Syntax und Semantik Althebräischer Perso-
nennamen, Universitätsbibliothek der Ludwig-Maximilians-Universität München, 2020.
doi:10.5282/UBM/EPUB.73364.
[17] R. G. Fellows, Early Textual Variants That Downplay the Roles of Women in the Bethany
Account, Textual Criticism 28 (2023) 67–82.
[18] E. J. Epp, Junia: The First Woman Apostle, Fortress Press, Minneapolis, 2005.
[19] R. G. Fellows, Early Sexist Textual Variants, and Claims That Prisca, Junia, and Julia Were
Men, The Catholic Biblical Quarterly 84 (2022) 252–278.
[20] Z. Ma, J. Tao, J. Hu, The dynamics of wikipedia article revisions: an analysis of revision
activities and patterns, International Journal of Data Mining, Modelling and Management
9 (2017) 298. doi:10.1504/ijdmmm.2017.088415.
[21] H. Houghton, C. Smith, IGNTP guidelines for XML transcriptions of New Testament
manuscripts (version 1.6), 2023. URL: http://epapers.bham.ac.uk/4301/.
[22] A. C. Myshrall, R. Kevern, H. Houghton, IGNTP guidelines for the transcription of
manuscripts using the Online Transcription Editor, 2020. URL: http://epapers.bham.ac.uk/
3436/.
[23] C. Werner, F. Krüger, Z. Shoukry, S. Al-Suadi, A Corpus of Biblical Names in the Greek New
Testament to Study the Additions, Omissions, and Variations across Different Manuscripts,
2024. doi:10.5281/zenodo.10985520.