The Linked Finding Aid as a Platform for Textual Research: The Case Study of the Giuseppe Raimondi Archive? Francesca Giovannetti1[0000 0001 6007 9118] and Francesca Tomasi2[0000 0002 6631 8607] 1 Department of Classical Philology and Italian Studies, University of Bologna, Italy 2 Department of Classical Philology and Italian Studies, University of Bologna, Italy Abstract. This paper makes new suggestions for rethinking archival finding aids by means of linked open data. It does so by outlining some features of a conceptual model for writers’ archives developed for the Dig- ital Library of the Department of Classical Philology and Italian Studies of the University of Bologna. The model, extending CIDOC-CRM/LRMoo, allows for the representa- tion of complex collections of interrelationships between heterogeneous archival entities, especially texts. It also adopts named graphs as a way of enriching the finding aid with additional and possibly competing in- terpretations by researchers. Through the case study of Giuseppe Raimondi’s archive, the paper sug- gests how the adoption of linked open data can broaden the role of the digital finding aid to serve as a platform for archival and textual research. Keywords: Digital Finding Aid · Linked Open Data · Writers’ Archives. 1 Introduction In his Introduction to Archival Science, Thomassen describes research on archives as “research on relations”[1]. The same point can be made about textual schol- arship: creating a scholarly edition requires extensive research within archives to establish links between texts and documents. This paper considers how the role of the archival finding aid in the context of textual research can be rethought and expanded by means of linked open data and semantic web technologies. Knowledge graphs have only recently been gathering attention as a way of publishing archives on the web. Compared to that of tree hierarchies, the data structure of knowledge graphs supports the creation of representations that can express higher orders of archival complexity. The descriptive potential of knowl- edge graphs is grounded on the semantic web architecture, which adopts the ? Section 1 is by F. Tomasi and Section 2 is by F. Giovannetti. Both authors con- tributed to Section 3. Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 F. Giovannetti and F. Tomasi Resource Description Framework (RDF) as its base, interoperable data model to convey information through semantic statements taking the form of subject- predicate-object expressions (see [2]). The introduction of knowledge graphs into archival practice, however, does not merely provide archival practitioners with new technological tools but also challenges traditional approaches to archival representation both methodologi- cally and conceptually. From a methodological point of view, the use of knowl- edge graphs as a method for archival representation prompts us to rethink the structure and organization of the archive towards a shift from neat hierarchies of records to fluid networks of logical interdependencies that can be arranged and rearranged into new representations. The semantic range of such interdependen- cies is virtually infinite, thanks to the possibility of combining predicates from di↵erent ontologies within a single graph and defining new predicates as needed. From a conceptual point of view, reconfiguring the finding aid as a knowl- edge graph broadens its role in the context of textual research by allowing mul- tiple interpretations by archivists and researchers to be incorporated into the archival representation as complex collections of interrelationships between het- erogeneous entities (see [3]). Our argument stems from a project undertaken by the Department of Clas- sical Philology and Italian Studies (FICLIT)[4] and the Digital Humanities Ad- vanced Research Centre (/DH.arc)[5] of the University of Bologna to expose as linked open data a selection of twentieth-century writers and intellectuals’ archives.3 The Giuseppe Raimondi Archive was chosen as a pilot, representative case study for the project. Giuseppe Raimondi (Bologna, 1898-1985) was an Italian writer. In the im- mediate aftermath of WWI, in 1918, he founded the literary journal La Rac- colta, which published papers by European authors such as Vincenzo Cardarelli, Giuseppe Ungaretti, Guillaume Apollinaire and Blaise Cendrars, many of whom Raimondi met for the first time during the war. The archive, held at FICLIT, provides an example of what we might call ‘multiple sedimentation’: it con- tains heterogeneous material, both as to what regards carriers (notebooks, loose papers, albums, newspaper clippings, printed volumes) and document types (let- ters, drafts, notes, newspaper and journal articles, illustrated postcards, draw- ings, photographs). Research on the archive have highlighted that the records it comprises are highly interconnected with one another as well as with material from other col- lections, and that the nature of such interrelations is heterogeneous. Consider, for example, a manuscript notebook, a newspaper article and a printed volume containing di↵erent versions of a text, possibly with internal corrections; these documents and their relationships span across various cultural heritage areas such as library, archival, museum and textual studies.4 3 See [6] for the list of personal archives held at FICLIT. 4 For an example of a subsequent scholarly reconstruction of interrelationships between heterogeneous archival documents in Giuseppe Raimondi’s archive see [7]. The Linked Finding Aid as a Platform for Textual Research 3 Experiments with CIDOC CRM as a base ontology for representing archival information have already proved fruitful for demonstrating how classes and prop- erties from CIDOC CRM could also be leveraged for the archival domain (see [8], [9], and [10]). However, none of these experiments have dealt specifically with the representation of the life cycle of writers’ archives, and especially of the role of subsequent users-researchers as creators of additional interconnections between texts and documents. In addition, archival description practices to date have tended to focus on the representation of record sets rather than individual documents and texts.5 On the other hand, most existing digital scholarly edi- tions prioritize a document-centred view of texts that uses TEI/XML markup [15] over LOD-based representations of textual phenomena and do not address the representation of the archival dimension in which texts participate.6 The primary goal of our project is to define a conceptual model that allows for the granular representation of complex interrelationships between heterogeneous documents and texts, such as those described above, within the finding aid. The model, extending CIDOC-CRM and LRMoo (formerly FRBRoo), leverages named graphs to enrich the finding aid with multiple and possibly competing readings by archivists and researchers.7 The section that follows presents selected features of the model through a practical example focusing on the representation of intertextual relationships as reconstructed by subsequent users-researchers. 2 Textual Research in the Finding Aid: An Example Una forca per il poeta François Villon / [Giuseppe Raimondi]. - 1976. - 1 quaderno (7 p. ms., di cui alcune numerate irregolarmente, su 10 c.) ; 21 cm. ((In cop., di altra mano, anche: (Gelo invernale e nostalgia di legna accesa) Il giorno 7.6.76; I dattiloscritti sono inseriti ne “I tetti sulla città”; a c. [3v]: 24.5.1976; a c. [5v]: 3 luglio 1976. - Contiene anche: A proposito di tegole, di tetti e di fantasmi. – Ms.8 This archival note, taken from the existing finding aid of the Giuseppe Raimondi Archive, describes one of Raimondi’s manuscript notebooks.9 The note reports four distinct titles: Una forca per il poeta François Villon (from now on T1), Gelo invernale e nostalgia di legna accesa (T2), I tetti sulla città (T3) and A proposito di tegole, di tetti e di fantasmi (T4). However, the note does not specify which texts are actually contained in the notebook nor the relationships between them, to the detriment of users and researchers. The description also includes three full dates, but they are not explicitly attributed to the corresponding texts. Actually, the notebook only contains two of the mentioned texts, T1 (dated 24 May 1976) and T4 (3 July 1976). Both became part of T3, a collection of 5 Refer to [11] for an in-depth discussion of the topic. 6 One exception is the Paolo Bufalini’s Notebook project, which describes intratextual relationships in LOD [12]. 7 On the transition from FRBRoo to LRMoo see [13]. 8 University of Bologna, BFICLIT, FR.A, QUADERNI.1 1976 03. 9 The existing finding aid of Giuseppe Raimondi’s archive dates back to March 1993. 4 F. Giovannetti and F. Tomasi works by Giuseppe Raimondi, which was first published in September 1977. On the other hand, T2 is a variant version of T1 that was published as a newspaper article just a month after the creation of T1 (7 June 1976). The archive contains Giuseppe Raimondi’s copy of T3. Furthermore, FICLIT holds two additional copies of the same book in the personal archives of Clemente Mazzotta and Ezio Raimondi, one of which features noteworthy manuscript annotations. How can this scenario, which subsequent research on the archive reconstructed, be represented within the finding aid? 2.1 The Archivist’s Base Description of the Notebook As anticipated above, our model, of which we only provide a brief account, adopts CIDOC-CRM as a basis for representing archival documents and order. Figure 1 and Figure 2 show a graphical representation of the archivist’s base description of the notebook (the graph is split into two parts for better reading). The following prefixes apply to all figures: @base . @prefix fdlo: . @prefix crm: . @prefix lrmoo: . @prefix pro: . @prefix rdf: . @prefix np: . CIDOC-CRM is the best candidate for integrating di↵erent conceptualiza- tions into one model because it is event-based and already compatible with LRMoo. The notebook is modelled as an instance of E22 Human-Made Object that carries an F2 Expression incorporating two texts, T1 and T4 (Fig. 1, cen- tre). Because it contains two distinct texts, the notebook is categorised both as a ‘notebook’ and as an archival ‘file’ through the P2 has type property (Fig. 1, top left). The notebook forms part of the archival unit, “QUADERNI.1 1976”, which is also an instance of E22. One crucial feature of our model is that each physical document is tied to a IIIF manifest [14] (an instance of E73 Information Object) pointing to large- scale, zoomable facsimiles of the object. In a similar way, each expression is linked to a transcription, which can be encoded in TEI/XML (Fig. 1, bottom left). The direct linking of facsimiles and transcriptions to the archival knowledge base supports further research and can act as a basis for the production of new digital scholarly editions of our texts.10 The creation of the notebook is modelled as an instance of F28 Expression Creation, linked to a specific date and technique (handwriting). The role of author is assigned to Giuseppe Raimondi using the Publishing Role Ontology (PRO) model, which allows for the reification of roles in such a way that each role is always linked to a specific context [17]. In our case, Raimondi holds the 10 On the boundary between digital archive and edition see [16]. The Linked Finding Aid as a Platform for Textual Research 5 role of author in the context of the creation of the notebook. The role of author, the person holding such a role and the created document are linked together using the Role In Time class (Fig. 1, bottom right). Continuing Figure 1, Figure 2 shows how the Expression Creation activity is divided into two sub-activities, one for each text, each linked to a specific date, via the P9 consists of property (Fig. 2, top left). The E13 Attribute Assignment class is used to model authorship attribu- tion. The attribute assigned is the Role In Time, and the assignee is Giuseppe Raimondi. The agent responsible for the attribution (in this case an institution rather than an individual) and the time of attribution are linked to the instance of E13 (Fig. 1, bottom right). 2.2 Enhancing the Base Description with Subsequent Interpretations by Researchers Among the objectives of our model is to support the incorporation of subse- quent scholarly interpretations within the finding aid. Figure 3 shows a graph of statements, modelled according to LRMoo, that reconstructs the relation- ships between T1, T2 and T3 as established by a researcher analysing Giuseppe Raimondi’s literary production. All three texts are modelled as instances of F2 Expression realizing the same work. The texts are linked to their carriers: T1 is carried by the notebook, while T3 is carried by multiple printed volumes available in the personal archives of Giuseppe Raimondi, Ezio Raimondi and Clemente Mazzotta. Because the volumes are from di↵erent archival collections, connecting them within the finding aid represents a fundamental step towards dismantling archival data siloes (see [18]). Additional relationships inferred by the reasoner on the basis of our ontology are displayed in blue. These relationships, FDLP2 has variant expression and FDLP3 is related by expression to, automatically link together the alternate ver- sions of a text and the physical documents containing such versions to facilitate search and retrieval. The graph only shows a subset of possible text-to-text connections. Connec- tions at a deeper level are also possible, such as fragment-to-fragment connec- tions describing authorial changes from one version to another (fragments can be modelled as instances of E90 Symbolic Object belonging to an expression, while ontologies such as the Critical Apparatus Ontology (CAO) provide use- ful properties for the representation of corrections) [19]. For example one could represent authorial changes to the title, from the initial Gelo invernale e nostal- gia di legna accesa to the final Una forca per il poeta François Villon, through the various intermediate stages. Using IIIF, it is also possible to establish links between circumscribed portions of the facsimiles and manuscript fragments. 2.3 Tying Each Set of Assertions to Their Provenance In order to accommodate multiple perspectives in the finding aid, all collections of statements (in the case discussed above, there are two distinct graphs, the 6 F. Giovannetti and F. Tomasi archivist’s and the researcher’s) must be associated with provenance information. This allows the archival knowledge base to describe not only the archive but also the process of archival representation and to integrate more collections of statements over time. Provenance information is modelled according to the Nanopublication frame- work [20]. The example below shows the basic structure of the nanopublication encapsulating the researcher’s interpretation from Figure 3. It is composed of four graphs: 1. the graph of assertions being made by the researcher; 2. a graph describing the provenance of the assertions, 3. a graph describing the provenance of the publication itself; 4. the top graph combining the previous three graphs into a single nanopublication: # Graph 1: the assertions being made. :assertion-02 { # The researcher’s reconstruction of the relationships between the texts (Fig. 2). } # Graph 2: the provenance of the assertions. :provenance-02 { :assertion-02 prov:generatedAtTime "2021-05-15T17:15:00Z"^^xsd:date ; prov:wasAttributedTo . } # Graph 3: the provenance of the nanopublication itself. :pubinfo-02 { :nanopub-02 prov:generatedAtTime "2021-05-15T17:15:00Z"^^xsd:dateTime ; prov:wasAttributedTo . } # Graph 4: the nanopublication and its components. :head-02 { :nanopub-02 a np:Nanopublication ; np:hasAssertion :assertion-02 ; np:hasProvenance :provenance-02 ; np:hasPublicationInfo :pubinfo-02 . } 3 Concluding Remarks Even in the digital environment, the finding aid remains a key tool for discovery and access. Its reconfiguration as an archival knowledge base has the potential to transform the finding aid into an expanding research platform where complex interrelationships between heterogeneous entities can be explicitly represented. The primary goal of the project described in this paper is to define a con- ceptual model for representing writers’ archives that allow for the expression of such interrelationships, with a special focus on connecting texts. Thanks to the use of named graphs, and nanopublications in particular, the model supports The Linked Finding Aid as a Platform for Textual Research 7 the ongoing enrichment of the digital finding aid with subsequent scholarly re- constructions of the context(s) characterizing the records. 