An Information System for Biblical Manuscripts Paratexts: Modeling, Implementation, and Future Directions Andrea Brunello1 , Emanuela Colombi1 , Matteo Raffin2 and Nicola Saccomanno2,* 1 Department of Humanities and Cultural Heritage, University of Udine, Palazzo Caiselli, Vicolo Florio 2, 33100 Udine, Italy 2 Department of Mathematics, Computer Science, and Physics, University of Udine, Via delle Scienze 206, 33100 Udine, Italy Abstract Paratexts—such as prologues, summaries, prefaces, and annotations—shape the presentation, interpretation, and transmission of texts across audiences and periods. Their study provides critical insights into the historical, philo- logical, and socio-cultural dimensions of manuscript production, use, and dissemination. Yet, a comprehensive analysis of Latin biblical paratexts remains lacking despite notable efforts on specific subsets, such as Marilena Maniaci’s researches on Atlantic Bibles and Chiara Ruzzier’s studies on 13th-century portable Bibles. This article takes part at addressing such a gap presenting an information system for managing paratexts in medieval Latin biblical manuscripts. Our contribution is twofold: (1) we propose a conceptual model of the domain of medieval Latin biblical manuscripts paratexts to standardize the field and support future research; and (2) we implement such a model through a relational database, which acts as the core of an information system for documenting and analyzing paratexts. Its open access prototype, already available, facilitates data organization and analysis, enabling prospective advanced applications, including artificial intelligence techniques. Keywords Biblical manuscripts, Paratexts, Conceptual modeling, Relational databases 1. Introduction When we think of the Bible, we often envision The Book for excellence, shaped by centuries of interpretation as a singular manifestation of divine word. However, its etymology, derived from the Greek ta biblia (“the books"), points to a plural textuality, while the history of biblical translations–from Hebrew (for what we call the Old Testament) into Greek, and subsequently into other ancient languages, such as Latin, and eventually into various modern languages–reveals transformations that challenge this presumed unity. These transformations result from the adaptation of the text to new historical and cultural contexts, as well as misunderstandings or textual innovations linked to the manuscript copying process. In this work, we address the definition and the current and prospective development of an infor- mation system, with a relational database at its core, designed for the management of these textual discontinuities within Latin biblical manuscripts, focusing on two key aspects: (i) the structure of their content and (ii) specific types of paratexts. As for the first point, the structural analysis of manuscripts reveals that not all exemplars contain the same biblical books in the same order, as the official Catholic canon was only definitively established at the Council of Trent in 1546. The manuscript tradition under investigation reflects the tension between a “short” canon, aligned with the Hebrew Bible and supported by Jerome in line with his preference for the Hebraica veritas, and a “long” canon, corresponding to the Greek Septuagint and earlier Latin translations. By the 9th century, surviving manuscript evidence shows a dual development: the gradual establishment of Jerome’s revision and his canonical vision, and the rise of single–or two– volume Bibles (pandectae), requiring preliminary decisions about content and order. The inclusion of IRCDL 2025: 21st Conference on Information and Research Science Connecting to Digital and Library Science, February 20-21 2025, Udine, Italy * Corresponding author. $ andrea.brunello@uniud.it (A. Brunello); emanuela.colombi@uniud.it (E. Colombi); raffin.matteo@spes.uniud.it (M. Raffin); nicola.saccomanno@uniud.it (N. Saccomanno)  0000-0003-2063-218X (A. Brunello); 0000-0002-0384-6664 (E. Colombi); 0000-0001-5916-3195 (N. Saccomanno) © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings variable deuterocanonical books led to hybrid editorial solutions, traceable in manuscripts from earlier and later periods, which the database seeks to document. This historical-theological aspect is closely tied to a codicological one: in Late Antiquity and the early Middle Ages, the Bible was primarily a bibliotheca–a collection of approximately ten codices, often with stable but somewhat variable contents. The juxtaposition or overlapping of transmission lines frequently caused absences or duplications, particularly for books with unstable placement, and these phenomena were exacerbated by the need to recover content from lost or damaged volumes. Over time, the pandectae gradually organized this diversity into a unified format, although not without challenges. For these reasons, as we will see, our proposed database records the books present in each exemplar and their order, enabling analyses of the transformations in the biblical canon as reflected in the manuscript evidence. The second type of discontinuity addressed in the database concerns the paratexts of Latin biblical manuscripts, which can be broadly defined as elements that accompany the biblical text without strictly belonging to it, including prologues, chapter headings, initial and final titles, running headers, marginal numbering, and indications of book lengths. These elements are particularly suited to tracing the discontinuities of interest, as they tend to persist like “fossils" from one copy to another, even when they prove inconsistent or inadequate. While Gérard Genette’s seminal works in the late 20th century defined paratexts as “thresholds” to the text characterized by authorial intention and responsibility [1, 2, 3], the study of medieval manuscript traditions requires adapting Genette’s approach to reflect the material uniqueness of each exemplar. This point has been highlighted by the reflections of Patrick Andrist on the terminology and ontology of paratexts in manuscript traditions [4, 5, 6], which emerged from the developments of the ERC-funded Paratexts of the Bible project, dedicated to Greek biblical manuscripts [7, 8, 6]. Paratexts mediate the text for diverse audiences across time, offering insights into the cultural and social roles of the Bible, the interaction between the implied reader (lector in fabula) and the actual reader, and the contexts of manuscript production, transmission, and use. This area of research, though increasingly prominent in recent decades, lacks a comprehensive study or multidisciplinary reflection for Latin biblical paratexts. Foundational contributions have emerged from the extensive research of Pierre-Maurice Bogaert [9, 10, 11, 12, 13, 14], the studies by Marilena Maniaci and Roberta Casavecchia on Atlantic and Beneventan Bibles [15, 16, 17, 18, 19, 20, 21], as well as from the research of María Adelaida Andrés Sanz on Spanish Bibles [22] and Chiara Ruzzier on 13th-century “Bibles portatives" [23], while the transmission of the Latin New Testament, including its paratextual aspects, has been explored by Hugh Houghton [24, 25]. However, a holistic study of the historical, philological, and socio-cultural significance of Latin biblical paratexts remains an important gap, particularly with regard to the books of the Old Testament. In this regard, the paratexts to the Octateuch in the Tours, Atlantic, and Beneventan Bibles are the subject of a recently funded PRIN PNRR project (see the acknowledgments section) to which the present research also belongs ([26]). This article seeks to address this gap by presenting a twofold contribution. First, we propose a thorough modeling of the domain of medieval Latin biblical manuscripts paratexts, aiming to standardize the field and establish a foundational framework for future studies. Second, we actualize this modeling through an information system based on a relational database, specifically designed to document and analyze these paratexts. The database, which is already accessible in its prototype form, serves as a dynamic tool for organizing and querying data, laying the groundwork for future expansions and applications, including the usage of artificial intelligence techniques, such as Large Language Models (LLMs) [27], to facilitate the interaction, and advanced graphical user interfaces to allow both information retrieval and the addition of new material. The long-term objective is for the information system to become a key reference in the literature for this domain, potentially integrated with other sources (e.g., [28]). In the literature, similar works to ours, though narrower in scope or focusing on a different domain, include the aforementioned database Paratexts of the Bible [7], which contains information on texts and paratexts found in Biblical manuscripts written in Greek. Currently, the dataset is primarily focused on Gospel books and is linked with a significant interoperability initiative to the Greek manuscript database Pinakes [29]. It is enriched with an extensive set of metadata, and the queries are based on the structure of Pinakes, but further search options are under development. Moreover, it will be essential to WD code Name City Country Book sequence number (1,1) Incipit Repository Contains Book Explicit (1,n) Total, disjoint (1,n) Stores Genesis ... Deuteronomy to D_includes... Siglum (1,1) Writing style (0,1) (1,n) Place (0,1) Book element ID Collection Manuscript Initial year (0,1) (0,1) Digital edition Final year (0,1) Total, disjoint MS identifier Notes (0,1) (1,n) Musical notation Height Marginal numbering G_includes Genesis ... Deuteronomy (0,1) Decorations Width (0,1) Number Size (0,1) (0,1) Total, disjoint Notes Element sequence order (0,1) (0,1) Proportion Junction Running title (0,1) (0,1) Prologus Capitula Textus Stichometry Initial heading (0,1) (0,1) Decoration initial letter Final heading (0,1) (0,1) Incipit anomaly Explicit anomaly Page number initial sheet final sheet Page number Page column Page column Figure 1: Conceptual Entity-Relationship diagram. For the sake of simplicity, we omitted to report the full list of book (element) types and the complete set of relationships linking them to their respective book elements. establish connections with complementary projects, including, in particular, the tool under development for Biblical manuscripts at Ludwig-Maximilians-Universität in Munich [30]. Additional insights are also offered by the PASSIM database, recently published as the result of the ERC project, led by Shari Boodts at the Radboud University of Nijmegen, on the manuscript tradition of medieval homiliaries [31]. While these represent a different manuscript typology, the database serves as an interesting model for managing fluid and complex textual content. It also demonstrates the potential of using new data analysis tools to identify similarities and divergences in the organisation of textual units, offering a distinct perspective and a tailored infrastructure to advance research in this evolving area. The paper is structured as follows: in Section 2, we present a conceptual modeling of the domain of medieval Latin biblical manuscripts; building on this foundational standpoint, Section 3 describes the design of a relational database, which serves as the core of an information system for managing these paratexts; Section 4 introduces the prototypical implementation of the database and provides usage examples to illustrate its core functionalities and to demonstrate how it can support research in the considered field. Finally, we conclude by evaluating the outcomes of this work and exploring directions for future developments. 2. Modeling Biblical manuscripts paratexts The primary objective of our conceptual modeling is to document not only the biblical books contained in the analyzed manuscripts but also the sequence in which they are arranged, along with the accompanying prefatory material. These considerations, which form a central focus of our research, have guided the development of an entity-relationship (ER) diagram [32] centered on individual textual and paratextual units, shown in Figure 1. This approach allows for a finer level of granularity, beyond the level of the biblical book, enabling us to trace discontinuities and analyze the diverse ways these units are combined in individual manuscripts. Additionally, the model facilitates tracking of similarity relationships between manuscripts based on the presence, absence, and sequence of textual and paratextual units. In the following, we present the main elements composing the diagram. Note that, for the sake of brevity, the diagram does not depict all entities and relationships. Instead, we present representative examples where the context makes them clear. For instance, we have omitted the full list of book (element) types and the complete set of relationships linking them to their respective book elements. Another consideration is that the ER diagram can be easily extended to accommodate additional information not currently tracked, should it become relevant (e.g., details about manuscript authors). Finally, although we are aware of the framework provided by the IFLA Library Reference Model [33], we used specific and immediately understandable domain names instead of the generic entity and relationship names provided by IFLA standards, though future reconciliations between the names we used and IFLA ones remain possible. Manuscript. It represents the physical manuscript, which is stored in one and only one repository and may contain one or more biblical books. In the case of fragmentary manuscripts preserved in different locations, the distinct units are recorded separately, highlighting their connections. The MS Identifier is a derived value which combines the repository WD code (see the paragraph containing the description of the entity Repository), Collection (defaulting to “MS” if the shelfmark consists only of a number), and Number. The Number works with the collection to uniquely identify manuscripts, and Digital edition links to an online digital reproduction, if available (optional). A manuscript is identified by the Siglum, a unique identifier within our model and internal to the project, as no universal list of biblical manuscript sigla currently exists. Siglum is essentially a shortened version of the attribute MS identifier. The list of manuscripts considered and their assigned sigla will be shared alongside the publication of the research outputs, making them universally identifiable to the scholarly community. A manuscript is described by several attributes. The Place refers to where the manuscript was likely written or its earliest traceable location (optional). The Initial year and Final year indicate its production time range, while Decorations and Musical notation are true/false attributes indicating the presence of decorative elements or musical notation, respectively. Writing style specifies the script style; other optional attributes record the Width and Height of each manuscript, with the Size (the sum of width and height) and the Proportion (i.e., the relationship between width and height) calculated automatically. The latter is expressed as a decimal number, increasing as the page’s shape approaches a perfect square (with a proportion of 1). These attributes are designed to provide an immediate visual impression of the manuscript and to facilitate quantitative codicological research. The attribute Notes provides additional details (optional). Repository. It is the entity that represents the current physical location of the manuscript, typically a library or a conservation institution. Its key is the WD code, which consists of a unique alphanumeric code extracted from the Wikidata portal [34], enabling information interoperability. The other attributes are Name, City, and Country, which respectively represent the name of the institution, the city, and the country where it is located. A repository may store one or more manuscripts (relationship Stores). Book. It represents a specific, “physical” biblical book within a manuscript. The modelling of this entity has proven to be crucial and has materialized in the concept of a book as a “container" encompassing both the biblical text itself and the accompanying paratexts, such as prologues and summaries. This approach enables a more comprehensive and historically accurate perspective on the manuscript: what are commonly referred to as biblical books are, in fact, the biblical texts, to which each exemplar adds further materials (elements) that belong to it and shape its interpretation. Each book belongs to one and only one predefined kind, representing its title (e.g., Genesis), following the form established in the critical edition of the Vulgate edited by Robert Weber and Roger Gryson [35]. Uniquely identifying a book requires knowing both the manuscript in which it is contained and its sequence within that manuscript (attribute Book sequence number). A preliminary note concerns the current limitation of the census to the books of the Old Testament: the database is, however, designed to accommodate future expansions to include New Testament books. Book element. It represents a generic, “abstract” component of a biblical book, categorized into one of three types: prologue, summary, or text. Unlike “physical” book elements, which may include paratextual or decorative attributes (e.g., the specific decorations or headings), the book element is defined solely by its textual content as standardised by the reference repertoires or edition. This abstraction allows for a clear distinction between the abstract, conventional textual content of a book element and its various physical representations (see relationship Includes), which may differ in their physical characteristics, such as text anomalies or varying paratexts, but share the same textual content identity. Each book element is uniquely identified by an ID based on its type and established domain references. For prologues, the ID corresponds to the numbering system of Friedrich Stegmüller’s Repertorium Biblicum [36]. For example, two distinct prologues for Genesis would each have a unique ID, distinguishing them. Conversely, a prologue with the same ID appearing in multiple books represents the same shared textual content. Summaries (capitula) follow the classification system proposed by Donatien de Bruyne [37], where IDs consist of series defined by letters or sigla, to which we have added the abbreviation (as defined in the Weber-Gryson edition) of the associated biblical book (e.g., A_Gn for Genesis summaries in series A). This structure facilitates vertical searches across books, enabling the identification of series spanning multiple biblical books, while simultaneously allowing for the recognition of the type of summary prefixed to the same biblical book in different exemplars. The respective incipits and explicits have also been added as attributes for each element, using the standardized forms published in the Weber-Gryson edition for biblical texts, Stegmüller’s repertoire for prologues, and de Bruyne’s edition for summaries.1 It was also necessary to establish predefined conventions for prologues and summaries: when these refer to groups of books (e.g., the prologue to the Pentateuch or to the collection of the Twelve Prophets), the element is “linked" to the first book of the group (in these cases, Genesis and Hosea, respectively). Includes. These relationships (one for each kind of Book/Book element) track the many-to-many association between a specific manuscript’s book and its elements. We have defined that a book can contain multiple (typically, up to five) prologues, at most one summary, and exactly one text. Books may also exist as texts without prologues and/or summaries. While the book element entity represents an abstract component of a book, defined by its type and textual content (via the ID attribute), linking a book element to a book “materializes” it. This connection captures the attributes of the specific physical instance of the book element as it appears in a particular manuscript’s book. These attributes include the Element sequence order, which specifies the order of the element within the book. The text always appears as the last element, while prologues and summaries can be arranged in any order. The Initial sheet and Final sheet indicate where the element begins and ends in the manuscript, defined by page numbers and column markers (e.g., “ra” for recto-column a, “vb” for verso-column b). The Initial heading and Final heading represent the opening and closing headings of the element, with manuscript’s abbreviations expanded using mixed-case letters to improve searchability. The primary goal is to facilitate the identification of discontinuities that have been largely underexplored but can provide valuable insights into relationships between codices. Moreover, headings tend to crystallize during transmission, sometimes offering crucial information about earlier layers and the convergence of different traditions in the titles used to identify both biblical texts and paratexts (e.g., prologus, praefatio, argumentum, capitula, brevis, capitulatio, etc.). These attributes can also be marked as “om.” if they have been omitted for undetermined reasons, or as “om. lac.” in cases where the omission is due to physical damage (material lacuna). The Running title is an optional attribute that records the possible book title as written in the upper margin. Similarly, the Decoration initial letter is an optional attribute indicating the presence of a decorated initial letter, which may also be marked as “om.” and/or “lac.” if missing or damaged. Stichometry is another optional attribute that records, in Arabic numerals, the number of lines declared at the end of the text in certain manuscripts; the original form in Roman numerals is instead recorded as part of the final heading. Additional optional attributes include Incipit anomaly 1 An additional enhancement that could provide a valuable service to database users would be the inclusion of the full text of prologues and summaries, though this would require verification regarding reproduction rights. WD code Name City Country Book sequence number (1,1) Incipit Repository Contains Book Explicit (1,n) Book type (1,n) (1,n) Stores ID Siglum (1,1) Writing style (0,1) Place (1,n) Book type (0,1) Marginal Collection Manuscript Initial year Includes Book element numbering Element type (0,1) (0,1) Digital edition Final year (0,1) MS identifier Notes (0,1) Notes Element sequence order (0,1) (0,1) (0,1) Musical notation Height Junction Running title (0,1) (0,1) (0,1) Decorations Width Stichometry Initial heading (0,1) (0,1) (0,1) Number Size Decoration initial letter Final heading (0,1) (0,1) (0,1) Incipit anomaly Explicit anomaly Proportion Initial sheet page number Final sheet page number Initial sheet page column Final sheet page column Figure 2: Restructured Entity-Relationship diagram. and Explicit anomaly, which record divergences in the beginning or end of the element compared to the standard reference text recorded in Book Element. Priority is given to the separate recording of all units provided with an ID: for instance, in the not uncommon case where a prologue is composed of the consecutive transcription of multiple prologues, presented as a single text, each textual unit is recorded separately, with the Notes field specifying that there is no break between these elements. Other anomalies, such as a total number of chapters differing from that indicated for a specific summary in de Bruyne’s edition, are documented in the Notes field. The attribute Marginal numbering records the presence of the marginal “capitulation" throughout the text and whether it corresponds to the initial summaries, with the letter K indicating the presence of chapter headings interspersed within the text as subtitles. The Junction attribute specifies whether the element ends at a gathering’s junction, with possible values including (other than null): “(x)” caesura with blank spaces, “?” possible caesura, “(?)” possible caesura with blank spaces, “/” confirmed caesura that includes the following incipit, “/?” possible caesura that includes the following incipit, “(/)” caesura that includes the following incipit and blank spaces, “(/?)” possible caesura that includes the following incipit and blank spaces, and “B” anomalous blank spaces at the end of a text. Finally, the Notes attribute is an optional textual field for providing additional information about the element. 3. Relational database development In this section, we present the structure of the relational database system that implements the ER diagram of Figure 1. We chose to rely on a relational database rather than a NoSQL solution, such as a graph database, because our data is highly structured and relational databases offer fine-grained control over data consistency. To define the database schema, we first restructured the ER diagram, removing elements that could not be directly mapped to a relational schema—specifically, generalizations and composite attributes in our case. The result is shown in Figure 2. Notably, we replaced the composite attributes in all Includes relationships with their individual components. Next, we chose to retain the derived attributes MS identifier, Size and Proportion of the Manuscript entity. Finally, we removed the Book and Book element specializations, replacing them with manuscript book_element includes siglum varchar 1 id varchar 1 book_manuscript_siglum varchar repository_wd_code varchar * book_type varchar book_sequence_number integer place varchar element_type varchar * book_element_id varchar collection varchar element_sequence_order integer number varchar book notes varchar * manuscript_siglum varchar initial_year integer junction varchar final_year integer sequence_number integer stichometry integer notes varchar book_type varchar decoration_initial_letter varchar decorations boolean incipit varchar running_title varchar writing_style varchar explicit varchar initial_heading varchar (manuscript_siglum, sequence_number) 1 musical_notation boolean final_heading varchar height numeric marginal_numbering varchar width numeric repository incipit_anomaly varchar size numeric 1 wd_code varchar explicit_anomaly varchar proportion numeric name varchar initial_sheet_page_number interger digital_edition varchar city varchar initial_sheet_page_column varchar ms_identifier varchar country varchar final_sheet_page_number interger final_sheet_page_column varchar * (book_manuscript_siglum, book_sequence_number) Figure 3: Logical relational schema. attributes within the parent entities to preserve their distinctions.2 This adjustment also allowed us to consolidate all the book-specific includes relationships into a single Includes relationship between Book and Book element. Upon closer examination of the data at our disposal, this proved to be indeed the correct choice, as there are cases where a book element inherently associated with a specific book type (e.g., a Genesis Prologue) was attached to a different book type (e.g., Deuteronomy) due to errors in the manuscript’s original assembly or its subsequent preservation. While the original ER diagram represents the ideal scenario, our adjustment accommodates such discrepancies and allows to keep track of them. Finally, according to a set of well-established mapping rules [32], we derived the database logical schema from the restructured diagram shown in Figure 2. The resulting relational schema, presented in Figure 3, illustrates the tables, attributes (along with their data types), and relationships that form the backbone of the database. This schema not only serves as a bridge between the conceptual design and the physical database implementation but also provides the database user with a clear map to guide their interaction with the system, ensuring a better understanding of its structure and functionality. 4. Prototype of the system A prototypical version of the relational database, implemented in PostgreSQL [38], and which serves as the core of the prospective information system, is already freely accessible.3 The source code of our implementation, including the SQL Data Definition Language (DDL) for deploying the database, the raw data underlying the database instance (which is continuously expanding), the database data import script, and the definitions of several useful SQL queries, will be made available on the project’s GitHub page [40] upon paper acceptance. As of the date of article submission, the database–continuously growing alongside the raw data– contains 23 repositories, 48 manuscripts, 1303 books, 251 book elements, and 2567 instances of the 2 Specifically, for the Book element, in addition to the type of the book (e.g., Genesis), encoded by the Book type attribute, we include the Element type attribute, which can take the values T (text), P (prologue), and C (summary). 3 The system is accessible at http://158.110.146.222:8080/. Upon connecting, users are presented with a pgAdmin [39] web server interface that prompts for login credentials (username = tester_biblical@ai4ch.uniud.it, password = UXftJGM5eNMdPGZ). A read-only user account grants privileges to perform select operations on the public schema of the biblical database. Includes relationship. 4.1. Exemplary interactions The database supports a wide range of interactions, from basic queries to more advanced use cases. Below, we report some notable examples of SQL queries that can be directly run against the prototypical online implementation of the database. SQL user defined functions will be implemented to simplify user interaction. Determine all summaries of Genesis. This type of query allows the identification of the various types of introductory paratexts (prologues and chapter headings) associated with the same biblical book, allowing for an examination of the diversity of editorial arrangements across different manuscripts. Query & results: All summaries of Genesis select id as book_element_id from book_element where book_type='Genesis' and element_type='C'; book_element_id A_Gn B_Gn C_Gn ... Determine all summaries from series A. Such queries allow for the cross-sectional verification of the presence of capitula across the various biblical books identified by De Bruyne [37] with the same letter. The search can be restricted to a single biblical book by specifying its reference abbreviation in the query (e.g., _Gn) according to the conventions of the Weber-Gryson edition [35], or by filtering on the attribute book_type. Query & results: All summaries from series A select id as book_element_id, book_type from book_element where element_type='C' and id like 'A\_%'; book_element_id book_type A_1 Mcc 1 Macchabeorum A_1 Sm_1 Rg 1 Samuhel_1 Regum A_2 Mcc 2 Macchabeorum ... ... Determine which prologues are associated with books named Genesis, and their respective frequencies. This query provides a ranked view of the distribution of paratexts (prologues or chapter headings) for a specific biblical book, emphasizing the relative prevalence of distinct editorial choices within the manuscript tradition. Query & results: Prologues associated with books named Genesis with gbooks as ( select * from book where book.book_type = 'Genesis' ), tmp_res as ( select book_element.id as prologue_id , count(*) as asbsolute_occurrences from gbooks join includes on (gbooks.manuscript_siglum = includes.book_manuscript_siglum and gbooks.sequence_number = includes.book_sequence_number) join book_element on includes.book_element_id = book_element.id where book_element.element_type = 'P' group by book_element.id ) select tmp_res.* , round(tmp_res.asbsolute_occurrences/(select sum(asbsolute_occurrences) from tmp_res),2) as rel_occurences from tmp_res order by rel_occurences desc; prologue_id asbsolute_occurrences rel_occurrences 285 27 0.59 284 14 0.30 290 2 0.04 ... ... ... Determine the initial and final headings of the book Canticum canticorum for all manuscripts in which it is present. This type of query allows for a synoptic visualization of the initial and final headings of a specific biblical book, enabling the identification of discontinuities and potential affinities, particularly in the case of more elaborate headings than the standard formula (incipit liber. . . explicit liber). These may contain valuable information; for instance, benevolent formulas (such as Deo gratias or Amen) could point to an earlier exemplar where such formulas marked the beginning or end of an independent volume, which was later incorporated into a larger or differently composed collection. The same query can be adapted to analyze the initial and final headings of both prologues and capitula. Query & results: Initial and final headings of the book Canticum canticorum select I.book_manuscript_siglum as manuscript_siglum , I.initial_heading , I.final_heading from includes I join book B on (I.book_manuscript_siglum=B.manuscript_siglum and I.book_sequence_number=B.sequence_number) where B.book_type='Canticum canticorum'; manuscript_siglum initial_heading final_heading Am5 INCIPIUNT CANTICA . . . EXPLICIUNT CANTICA . . . Amt INCIPIT LIBER. . . EXPLICIT LIBER . . . An2 INCIPIUNT CANTICA . . . EXPLICIUNT CANTICA . . . ... ... ... Determine the relative order in which the books ‘1 Ezras,’ ‘2 Ezras (Neemia),’ ‘Iudith,’ ‘Hester,’ and ‘Tobias’ are presented in the manuscripts, considering only the manuscripts that contain at least one of these books. This more complex type of query addresses one of the fundamental research questions outlined earlier: the relationship between the canon expressed by individual biblical manuscripts and their arrangement, including codicological aspects, whether they are bibliothecae or pandectae, incorporating earlier partial collections. This aligns with reflections on the ‘modular’ structure of Atlantic Bibles discussed by [15] (pp. 54–56) and the requirements outlined by [30]. The example provided focuses on a group of biblical books characterized by significant instability in their presence and relative order. This instability can be traced back to Jerome’s editorial project for the Vulgate and his adherence to the Hebraica veritas. Tobit, Judith, and parts of Esther belong, in fact, to the so-called deuterocanonical books, included in the Greek Septuagint translation but absent from the Hebrew canon. However, Jerome agreed to translate them, indirectly validating their inclusion in Vulgate manuscripts. Nonetheless, their sequence oscillates [41], which also impacts the complex dossier of the book of Ezra [42, 43]. Query & results: Books relative order select B.manuscript_siglum as man_siglum , row_number() over (partition by B.manuscript_siglum order by B.sequence_number) as rel_order , B.book_type , min(initial_sheet_page_number) as i_sheet , min(initial_sheet_page_column) as i_column , max(final_sheet_page_number) as f_sheet , max(final_sheet_page_column) as f_column from book B join includes on (B.manuscript_siglum =includes.book_manuscript_siglum and B.sequence_number = includes.book_sequence_number) where B.book_type in ('1 Ezras', '2 Ezras (Neemia)', 'Iudith', 'Hester', 'Tobias') group by B.manuscript_siglum, B.sequence_number; man_siglum rel_order book_type i_sheet i_column f_sheet f_column Amt 1 Tobias 701 va 708 vb Amt 2 Iudith 708 ra 720 rb Amt 3 Hester 720 ra 730 rb ... ... ... ... ... ... ... Determine the difference between two manuscripts in terms of book ordering, considering only the books they have in common. We focus on a specific pair of manuscripts, identified by “Amt” and “Sg1.” First, we retrieve all books they have in common, beginning with those in “Amt.” For each shared book, we calculate its relative order of appearance within only the shared ones, and we also report its complete sequence within the manuscript. We then repeat this procedure for the “Sg1” manuscript. Next, we pinpoint all cases where the two manuscripts present different books occupying the same relative position in their respective sequences. In each such case, we list the book as it appears in “Amt” and provide its full order of appearance, comparing it to its full order in “Sg1.” On the same row, we also record the counterpart book that “Sg1” places in the same relative position, along with the location of that book in the “Amt” manuscript. For the two manuscripts considered, from the query we obtain 17 rows over a total number of shared books of 32. Note that, starting from a similar query, it is possible to calculate an “index of diversity” between manuscripts, for instance, drawing inspiration from the Kendall tau rank distance [44], which represents the number of element swaps needed to transform one list into another. Query & results: Manuscripts difference in terms of book ordering with manuscript_1 as ( select book_type , row_number() over (partition by manuscript_siglum order by sequence_number) as rel_order , sequence_number as book_order from book where manuscript_siglum = 'Amt' and book_type IN (select book_type from book where manuscript_siglum = 'Sg1') ), manuscript_2 as ( select book_type , row_number() over (partition by manuscript_siglum order by sequence_number) as rel_order , sequence_number as book_order from book where manuscript_siglum = 'Sg1' and book_type IN (select book_type from book where manuscript_siglum = 'Amt') ) select m1.book_type as b1_m1 , m1.book_order as b1_m1_order , m1b.book_order as b1_m2_order , m2.book_type as b2_m2 , m2.book_order as b2_m2_order , m2b.book_order as b2_m1_order from manuscript_1 m1 join manuscript_2 m2 on (m2.rel_order = m1.rel_order and m1.book_type < m2.book_type) join manuscript_2 m1b on m1b.book_type = m1.book_type join manuscript_1 m2b on m2b.book_type = m2.book_type; b1_m1 b1_m1_order b1_m2_order b2_m2 b2_m2_order b2_m1_order Ionas 31 22 Psalmi 31 15 Micha 32 23 Psalmus CLI 32 16 Naum 33 24 Proverbia 33 17 ... ... ... ... ... ... 4.2. Broader research questions and interactions supported by the system The previous exemplary queries illustrate how the system enables users to efficiently extract information about texts and paratexts. This functionality supports diverse research endeavors, including the study of transformations in the biblical canon, which are evident, for example, both in the selection of books included in a manuscript and in the order in which those books are arranged. Furthermore, the system can provide support for the automated content similarity evaluations across a large number of manuscripts, providing new opportunities to investigate the historical and codicological factors that shape the arrangement of books within exemplars. In the following, we outline more in detail some research questions and interactions that can be effectively addressed using the system. Tracking textual and paratextual features: Documenting the sequence and location of (para)textual elements within individual manuscripts and recording the presence or absence of dec- orative elements, such as initials or other embellishments. This includes examining initial and final headings for patterns in textual transmission, including formulaic incipit and explicit expressions. Stichometric and quantitative textual analysis: The inclusion of data such as the recording of stichometry, presence of Decoration initial letter (as a boolean attribute), and Junction (with symbols differentiated by the degree of certainty and the presence of blank spaces) facilitates the adoption of computational processing and quantitative analysis. Cross-manuscript comparison: Facilitating the discovery of relationships between (groups of) manuscripts through shared features in headings, capitula, and structural design, to study the divergence and convergence in traditions. Advanced tools, including tokenization and semantic comparison of headings and textual strings, may also enable simi- larity computation, for instance by means of machine-learning models or string-embedding techniques. Full-text search and annotations: Supporting full-text search capabilities to identify anomalies in headings, numbering, or other textual elements. Textual transmission and scribal practices: Investigating how formulaic incipit and explicit expressions evolve across manuscript traditions and how scribes exercise autonomy in reproducing or adapting paratextual elements, offering insights into the historical dynamics of manuscript preparation and adaptation. Codicological studies: Studying structural features, such as junctions between textual elements and manuscript gatherings, as well as the size and proportions of manuscripts, provides valuable insights into codex production across different times and regions. This approach supports the identification of distinctive features within various manuscript traditions and facilitates the application of quantitative codicological analysis. 5. Conclusions and future developments In this work, we presented the conceptualization of the domain of biblical manuscripts paratexts, followed by the design and development of a relational database system for managing such kind of data. The conceptualization effort not only guided the system’s development but also contributes to the standardization of the field, laying a robust foundation for future studies. While this is an ongoing project, a prototype of the relational database—intended to serve as the core of a comprehensive information system—is already freely accessible online. As for future work, in addition to the research avenues already discussed, plans include the devel- opment of a graphical user interface (GUI) for interacting with the database. This interface will be further enhanced by incorporating artificial intelligence techniques, such as large language models (LLMs), to enable natural interaction with the stored data. For instance, users could query the system using natural language or engage in conversational interactions to explore the data more intuitively, leveraging text-to-SQL systems [45]. Acknowledgments The research is part of the DOBiPS – Data Oriented Biblical Paratexts Studies project, awarded for the 2023–2025 biennium to the research units of the Universities of Udine (P.I. E. Colombi) and Cassino (P.I. R. Casavecchia) under the competitive PRIN PNRR call – National Recovery and Resilience Plan, Mission 4 Education and Research, funded by the European Union Next-GenerationEU (protocol no. P2022ZW4AW). Nicola Saccomanno also acknowledges the support from the Interconnected Nord-Est Innovation Ecosystem (iNEST), which received funding from the European Union Next- GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.5 – D.D. 1058 23/06/2022, ECS00000043). References [1] G. Genette, Introduction à l’architexte, Paris: Seuil, 1979. [2] G. Genette, Palimpsestes. La littérature au second degré, Paris: Seuil, 1982. [3] G. Genette, Seuils, Paris: Seuil, 1987. [4] P. Andrist, Toward a definition of paratexts and paratextuality: The case of ancient Greek manuscripts, De Gruyter, Berlin, Boston, 2018, pp. 130–150. doi:10.1515/9783110603477-010. [5] P. Andrist, The limits of paratexts/paracontents in manuscripts: Revisiting old questions and posing new ones, COMSt Bulletin 8 (2022) 215–233. [6] P. Andrist, Asymmetrical descriptions of biblical manuscripts: A key to the success of the Paratexts of the Bible (ParaTexBib) project and its database, Bibliothek und Wissenschaft 55 (2022) 63–78. [7] M. Wallraff, et al., Manuscripta Biblica, 2020. URL: https://www.manuscripta-biblica.org, accessed: 2024-11-15. [8] P. Andrist, M. Wallraff, Paratexts of the Bible: A new research project on Greek textual transmission, Early Christianity (2015) 237–243. [9] P.-M. Bogaert, Les particularités éditoriales des Bibles comme exégèse implicite ou proposée. Les sommaires ou capitula donatistes, in: I. Iudaicum (Ed.), Lectures bibliques. Colloque du 11 nov. 1980, Publications de l’Institutum Iudaicum, Bruxelles, 1982, pp. 7–21. [10] P.-M. Bogaert, La Bible latine des origines au moyen âge. Aperçu historique, état des questions, Revue Théologique de Louvain 19 (1988) 137–159; 276–314. [11] P.-M. Bogaert, Aux origines de la fixation du canon. Scriptoria, listes et titres. Le Vaticanus et la stichométrie de Mommsen, in: J.-M. Auwers, H. D. Jonge (Eds.), The Biblical Canons, Louvain, 2003, pp. 153–176. [12] P.-M. Bogaert, The Latin Bible, c. 600 to c. 900, in: R. Marsden, E. A. Matter (Eds.), The New Cambridge History of the Bible, Volume 2, Cambridge University Press, Cambridge, 2012, pp. 69–92. [13] P.-M. Bogaert, The Latin Bible, in: J. C. Paget, J. Schaper (Eds.), The New Cambridge History of the Bible, Volume 1, Cambridge University Press, Cambridge, 2013, pp. 505–526. [14] P.-M. Bogaert, Entre canon(s) et textes bibliques. Que traduire?, Recherches de Science Religieuse 106 (2018) 53–71. [15] M. Maniaci, La struttura delle Bibbie atlantiche, in: M. Maniaci, G. Orofino (Eds.), Le Bibbie atlantiche. Il Libro delle Scritture tra monumentalità e rappresentazione, Catalogo della mostra, Milano, 2000, pp. 47–60. P. 47. [16] M. Maniaci, Written evidence in the Italian Giant Bibles: Around and beyond the sacred text, in: L. I. Lied, M. Maniaci (Eds.), Tracing Annotations and Annotation Practices in Late Antique and Medieval Biblical Manuscripts, De Gruyter, Berlin, Boston, 2018, pp. 85–100. URL: https: //doi.org/10.1515/9783110603477-006. doi:10.1515/9783110603477-006. [17] M. Maniaci, The structure of Atlantic Bibles, Trends in Statistical Codicology (2021). URL: https://api.semanticscholar.org/CorpusID:243901180. [18] M. Maniaci, Chapter lists in Giant and Beneventan Bibles: Some preliminary remarks, in: T. Bernheimer, R. Vollandt (Eds.), Synopses and Lists. Textual Practices in the Pre-Modern World, Cambridge, 2023, pp. 295–297. [19] R. Casavecchia, M. Maniaci, G. Orofino, Montecassino e la Bibbia. Forme, contenuti, decorazione, in: La Bibbia a Montecassino / The Bible at Montecassino, Turnhout, 2021, pp. 12–68. [20] R. Casavecchia, Bibbia e paratesti a Montecassino: I capitula al libro della Genesi, Scripta. An International Journal of Codicology and Palaeography 16 (2023) 61–94. [21] R. Casavecchia, M. Maniaci, Partial Bibles in southern Italy: The case of Montecassino, in: P. Andrist, E. Attia, M. Maniaci (Eds.), From the Thames to the Euphrates. Intersecting Perspectives on Greek, Latin and Hebrew Bibles / De la Tamise à l’Euphrate. Regards croisés sur les Bibles grecques, latines, et hébraïques, volume 9 of Manuscripta Biblica, De Gruyter, Berlin, 2023, pp. 83–102. [22] A. Sanz, M. Adelaida, Les préfaces de la Bible latine dans le haut Moyen âge hispanique, Annuaire de l’École pratique des hautes études (EPHE), Section des sciences historiques et philologiques. Résumés des conférences et travaux (2019) 205–221. [23] C. Ruzzier, Entre Université et ordres mendiants: La production des bibles portatives latines au XIIIe siècle, volume 8 of Manuscripta Biblica, De Gruyter, Berlin, 2022. [24] H. A. Houghton, Chapter divisions, capitula lists, and the Old Latin versions of John, Revue bénédictine 121 (2011) 316–356. [25] H. A. G. Houghton, The Latin New Testament: A Guide to its Early History, Texts, and Manuscripts, Oxford University Press, Oxford, 2016. URL: https://global.oup.com/academic/ product/the-latin-new-testament-9780198744733. doi:10.1093/acprof:oso/9780198744733. 001.0001. [26] R. Casavecchia, E. Colombi, M. Maniaci, A. Peri, La ricerca del Progetto DOBiPS - Data Oriented Biblical Paratext Studies, Brepols Publishers, Paratext Studies series, 2025, p. in press. [27] S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, J. Gao, Large language models: A survey, arXiv preprint arXiv:2402.06196 (2024). [28] R. Astell, Biblissima, Digital Philology: A Journal of Medieval Cultures 10 (2021) 331–334. [29] IRHT, Pinakes, 2024. URL: https://pinakes.irht.cnrs.fr/, accessed: 2024-11-20. [30] P. Andrist, T. Englmeier, S. Dirkse, New digital strategies for creating and comparing the content structure of biblical manuscripts, Journal of Data Mining & Digital Humanities (2023). doi:10. 46298/jdmdh.10981. [31] S. Boodts, G. Schmidt, R. Macchioro, I. Denis, M. Rempt, E. Komen, T. Hermsen, PASSIM research tool, 2024. URL: https://passim.rich.ru.nl/, accessed: 2024-09-15. [32] P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database systems: Concepts, languages & architectures, McGraw-Hill, 1999. [33] P. Riva, P. Le Boeuf, M. Žumer, IFLA Library Reference Model: A Conceptual Model for Bibli- ographic Information, Technical Report, International Federation of Library Associations and Institutions (IFLA), 2018. [34] Wikidata contributors, Wikidata: The free knowledge base, 2024. URL: https://www.wikidata.org/ wiki/Wikidata:Main_Page, accessed: 2024-10-01. [35] R. Weber, R. Gryson, Biblia sacra iuxta Vulgatam versionem. Editio quinta, Stuttgart: Deutsche Bibelgesellschaft, 2007. [36] F. Stegmüller, Repertorium Biblicum Medii Aevi, 1. Initia biblica. Apocrypha. Prologi, Consejo Superior de Investigaciones Cientificas, Madrid, 1950. [37] D. de Bruyne, Sommaires, divisions et rubriques de la Bible latine, A. Godenne, Namur, 1914. Reprinted as Summaries, Divisions and Rubrics of the Latin Bible, with introductions by Pierre- Maurice Bogaert and Thomas O’Loughlin, Turnhout: Brepols, 2015. [38] P. G. D. Group, PostgreSQL, 2022. URL: https://www.postgresql.org/, accessed: 2024-11-01. [39] pgAdmin Development Team, pgAdmin, 2022. URL: https://www.pgadmin.org/, accessed: 2024-10- 03. [40] A. Brunello, E. Colombi, M. Raffin, N. Saccomanno, GitHub page of the relational database for medieval Latin biblical manuscripts project, 2024. URL: https://github.com/dslab-uniud/ Database-biblical-manuscripts, accessed: 2024-09-01. [41] P.-M. Bogaert, Tobie, Esther et Judith dans la stichométrie de Mommsen, in: Miscellanea codico- logica F. Masai dicata MCMLXXIX, volume II, Story-Scientia: Gand, 1979, pp. 545–550. [42] P.-M. Bogaert, Les livres d’Esdras et leur numérotation dans l’histoire du canon de la Bible latine, Revue Bénédictine 110 (2000) 5–26. [43] M. Morard, Bibliotheca Sacra. les variations des livres d’Esdras dans la Bible latine, in: Sacra Pagina, IRHT-CNRS, 2024. Consultation du 21/11/2024. https://gloss-e.irht.cnrs.fr/php/page.php?id=182. [44] R. Sedgewick, K. Wayne, Algorithms, 4th ed., Addison-Wesley Professional, 2011. [45] G. Katsogiannis-Meimarakis, G. Koutrika, A survey on deep learning approaches for text-to-SQL, The VLDB Journal 32 (2023) 905–936.