<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Web Semantics 75 (2023) 100753. doi:10.1016/j.websem.2022.100753.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.6092/issn.2532-8816/9091</article-id>
      <title-group>
        <article-title>Modelling Knowledge for the PAVES-e Project: a Formal Ontology of Cesare Pavese's Work</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Arena</string-name>
          <email>arenagiuseppe137@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salvatore Cristofaro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Gafà</string-name>
          <email>giovanni.gafa@phd.unict.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daria Spampinato</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Independent Researcher</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CNR - ISTC</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catania</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università di Catania, Italy - Université Jean Moulin Lyon 3 de Lyon</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>33</volume>
      <issue>15</issue>
      <fpage>9</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>This paper describes the OntoPavese ontology developed as part of the PAVES-e project. OntoPavese is a largesized ontology that takes care of various aspects of Cesare Pavese's works; it implements diferent representation levels of Pavese's production items, along with a number of other related features dealing with such entities as temporal events, persons and places. An ad hoc created WEB tool-OntoPavesePathExplorer-is also described that allows to visually explore individual relationships defined within the ontology.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology</kwd>
        <kwd>Cesare Pavese</kwd>
        <kwd>Digital Scholarly Edition</kwd>
        <kwd>Semantic Edition</kwd>
        <kwd>FRBR</kwd>
        <kwd>LRM</kwd>
        <kwd>CIDOC-CRM</kwd>
        <kwd>RiC</kwd>
        <kwd>Visual Relationship Exploration</kwd>
        <kwd>Knowledge extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1https://digitalpavese.cnr.it/en/ontopavese-2/. Last accessed 2025/08/18
2The WEB portal is accessible at https://digitalpavese.cnr.it/. Last accessed 2025/08/18
3Text Encoding Initiative, the de facto international standard for digital scholarly editions of texts https://www.tei-c.org/. Last
accessed 2025/08/18
4https://teipublisher.com/. Last accessed 2025/08/18
5A comprehensive overview of digital scholarly editions can be found in the catalogs of Franzini https://dig-ed-cat.acdh.oeaw.
ac.at/ and Sahle https://v3.digitale-edition.de/. Last accessed 2025/08/18
6One example above all others can be found in the edition of Quaderni di Paolo Bufalini in [5].
7A good example of a bibliographic ontology is BiGrafo, relating to the works of Franco Fortini in https://github.com/DFCLAM/
bigrafo. Last accessed 2025/08/18</p>
      <p>
        Even to obtain answers to the aforementioned CQ, we have chosen to accompany the PaveseInTesto
edition with a formal (RDF/OWL) bibliographic ontology. Furthermore, ontologies are efective in
addressing several issues and aspects that arise in the specialized semantic representation of data, as is
the case with OntoPavese:
• The provided information is often incomplete. For instance, the creation date of a work might be
missing details (e.g., the day).
• Web publications are, by their own nature, open-ended works that allow for corrections,
integrations, additions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
• Many of the available data are structured into complex semantic hierarchies. For instance, Pavese’s
drafts may contain either private writings or literary works; the latter can be further divided into
essays and creative works, and within creative works a distinction can be made between texts in
verse and in prose.
• Using an ontology enables the integration of data to enrich query results; moreover, the adoption
of well-known standards for the semantic description of the entities within the edition-archive
allows to share information with the scientific community and turning PAVES-e into a semantic
digital edition.
      </p>
      <p>The paper also outlines the process of constructing the knowledge graph using the XML files
containing TEI annotations from the digital editions, along with other project materials. Additionally, the
OntoPavesePathExplorer tool is described, enabling the visual exploration of individual relationships
within the ontology.8</p>
      <p>The paper is divided into three main sections. Section 2 provide insights on the nature of OntoPavese,
its underlying data and models, and other related features; in Section 3, the tool
OntoPavesePathExplorer is presented, along with a description of its main functionalities, coupled with some formal
details about abstract structures underpinning them. Finally, Section 4 concerns with population of
OntoPavese.9</p>
    </sec>
    <sec id="sec-2">
      <title>2. OntoPavese’s model</title>
      <sec id="sec-2-1">
        <title>2.1. The data</title>
        <p>The data available within the project concern a very significant portion of Pavese’s textual production;
more precisely: all known editions, in volumes and periodicals, containing any text by the author, in
their first edition and in subsequent editions and reprints (when deemed appropriate by the domain
experts); all works written by Pavese, including the unpublished ones, regardless of whether they are
private writings or intended for the public; all manuscripts of the author archived at the “Guido Gozzano
— Cesare Pavese” Study Center that contain texts of the author’s main creative works in prose and poetry,
selected by the domain experts.</p>
        <p>All of these entities need to be represented within the ontology, which requires, first and foremost,
the identification of the main information items– entity attributes–that characterize them. In fact, from
a careful analysis of the available materials and the project targets, the following needs emerged:
• In the case of editions, it is necessary to represent (as a minimum) the attributes related to the
bibliographic metadata.
• Concerning works, besides the title, the need is to also represent the type (e.g., “poem” or “story”),
the place and date of writing, and, in the case of letters, the recipient.</p>
        <p>
          • For manuscripts, it is necessary to represent the attributes pertaining to the archival metadata.
8Both the OntoPavese ontology and the tool OntoPavesePathExplorer were introduced in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This paper presents a number
of advancements and also provides more detailed and comprehensive descriptions compared to [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
9The work is the result of a constant collaboration among the authors during the phases of conception, planning, drafting,
and revision. In particular, Section 1 is credited to Daria Spampinato, Section 2 to Giovanni Gafà, Section 3 to Salvatore
Cristofaro and Section 4 to Giuseppe Arena.
        </p>
        <p>The first step in the development of our ontology is the identification of the most appropriate standards
to represent this information, which can be understood through three key data-related perspectives:
the bibliographic perspective, that deals with the editions of Pavese’s texts; the philological perspective,
which, in our case, mainly concerns the creative and revising process of the works; and the archival
perspective, that pertains to the preservation of the documents that transmit those works and texts. For
instance, in the case of a poetry collection such as Lavorare stanca [9], we have to describe the various
editions and the characteristics of the work and of the textual units (the poems) that compose it, as well
as those of the documents containing the diferent manuscript drafts of these poems, complete with the
relevant archival information.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Modelling the bibliographic and filological information</title>
        <sec id="sec-2-2-1">
          <title>2.2.1. The LRM Model</title>
          <p>LRM (Library Reference Model) [10] is the standard de facto for describing the semantics of bibliographic
information, established by IFLA. LRM is a high-level conceptual model, currently available as a formal
ontology–LRMoo [11].10</p>
          <p>OntoPavese uses various entities from LRMoo and the formal CIDOC-CRM ontology11 to describe
not only the bibliographic level of information of PAVES-e, but also the philological one. In particular,
the ontology adopts the core structure of LRM, inherited from the FRBR model [12], exploiting the four
abstraction levels Work, Expression, Manifestation and Item (WEMI ), implemented in LRMoo by means
of the classes F1_Work, F2_Expression, F3_Manifestation, and F4_Item, respectively.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. A Modelling Example: Lavorare stanca</title>
          <p>The levels Work and Expression allow to model Pavese’s work from a philological perspective with
considerable expressive power. Let us return to the case of the collection Lavorare stanca, mentioned
earlier, which raised several editorial issues, even in terms of the drafting of the individual poems
and their inclusion and ordering within the collection [13]. Two significantly diferent versions of the
collection exist: the one published in 1936 by the Solaria publishing house [14], and the one edited in
1943 [9] by Einaudi. These two editions essentially difer in what poems they include (the 1936 edition
had to contend with censorship under Fascism), as well as in the internal ordering and textual contents
of these poems (although the changes at this level are often marginal).</p>
          <p>We can represent Lavorare stanca as a single Work (i.e., an instance of F1_Work), realized (property
R3_is_realised_in) in two diferent Expressions (i.e., two instances of F2_Expression), to which there
correspond (property R4i_is_embodied_in) the first editions of 1936 and 1943 (instances of F3_Manifestation),
as well as all subsequent editions that refer to either of these Expressions. The collection, as a Work, is
related to the individual Works representing the poems it contains via the property R67_has_part, just
as the Expressions of these poems are related to those of Lavorare stanca, to which they belong, through
property R5i_is_component_of . It is worth mentioning that the choice of using these two properties has
required some debate; indeed, LRMoo provides, for F1_Work, the property R74_uses_expression_of , and,
for F2_Expression, the corresponding property R75_incorporates, that could be employed to describe the
situation where a Work includes texts that realize another, diferent, Work. However, the possibility of
employing R74_uses_expression_of and R75_incorporates proved problematic for the following reasons,
and we therefore decided not to adopt them–though we do not rule out revisiting this choice in the
future. The creative act of conceiving a collection (of poems) would not just consist (to our opinion,) in
the mere sum of the creative acts by which the individual poems were composed. From this perspective,
using the above properties to define the relationship between a work and its components, could then
be the more appropriate choice. On the other hand, the use case examples accompanying the LRMoo
10http://iflastandards.info/ns/lrm/lrmoo/. Last accessed 2025/08/18
11The classes and properties of LRMoo integrate with those of the formal CIDOC-CRM ontology: http://www.cidoc-crm.org/
cidoc-crm/. Last accessed 2025/08/18
definitions of R74_uses_expression_of and R75_incorporates actually seem to point in the opposite
direction, where a poetry collection can be viewed as the ordered set of poems it contains. In any case,
since our choice suits our purposes well enough, there is no need to dwell further on such issues.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.2.3. Modelling a Single Work and Its Drafts</title>
          <p>If the collection takes on diferent forms over time, then, as we have seen, the same can also be said
of the poems it contains. LRM understandably leaves a certain degree of freedom about the extent of
change that must occur in a work’s text for it to be considered a diferent Expression (see [11, p. 24]).
Once again, the choice rests with the domain experts, who determine on a case-by-case basis when a
form assumed by the text can be considered a new Expression.</p>
          <p>Text modifications do not occur only within the poems published in the 1936 and 1943 editions of
Lavorare stanca; there are indeed several drafts of each text, both handwritten and typewritten, that
difer–sometimes significantly–from each other and from the printed version. In the special case of
manuscript texts12 we have decided to create as many Expressions as there are successive drafts of each
poem. To further distinguish these drafts from those corresponding to the publication, we internally
defined the class ExpressionDraft as a means of collecting them separately from other Expressions. We
also decided not to explicitly model the relationship between successive drafts. LRMoo does in fact
provide the property R76_is_derivative_of , the description of which suggests that it could be used, among
other things, to represent a revision of a text. Now, it would certainly be legitimate to establish that
Pavese’s corrective work on a given draft–typically quite intense–constitutes an Expression derivative
of the source text it revised. However, that would lead us to represent each draft through multiple
Expressions, thus implying a descriptive level that delves into the content of the individual text, which
is beyond the scope of our initial objectives; conversely, asserting that a given draft can be considered
the source of some subsequent draft would require a case-by-case philological analysis of the variants,
which did not fall within our purview. Therefore, we limite ourselves to indicating, via a dedicated data
property, the sequential number of the drafts that (at least) orders them chronologically, while leaving
open the possibility of using property R76_is_derivative_of in the future, should the domain experts
decide to add this level of description.</p>
          <p>Figure 1 presents the instantiation of the ontology with reference to Lavorare stanca and one of the
poems it comprises, Ulisse.13</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.2.4. Modelling the Manuscripts</title>
          <p>
            Manuscripts were given particular attention, as their pages are reproduced in facsimile format on the
WEB portal of the project, within the diplomatic editions of the works, and there was thus a desire
to represent them in the ontology. Drafts of texts have a status that is dificult to capture within the
WEMI model[10], which–while allowing the description of the developmental stages of a work from its
conception to the individual printed copy–is aimed at representing bibliographic information and thus
published texts, not those preserved in archives. In a manuscript, the Manifestation is exemplified by
just one Item. Thus, as noted by Elena Pierazzo[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], all the features of the Manifestation are also features
of the Item, making the distinction irrelevant in this case. Hence, as we were not interested in describing
the Manifestation level for manuscripts, we introduced a shortcut property, hasInstantiation, to link each
instance of ExpressionDraft to its corresponding instance in PhysicalDraft–another internally defined
subclass of F4_Item (see also Section 2.3 for further details). Instances of PhysicalDraft are composed of
instances of PhysicalPage (property P46_is_composed_of from the CIDOC-CRM ontology), a subclass of
E24_Physical_Man-Made_Thing, also defined in the CIDOC-CRM ontology.
12Typescripts do not fall within the scope of information to be represented in the ontology.
13For the sake of clarity, the diagram does not display all Manifestations of the works, nor all extant drafts of the poem.
          </p>
          <p>Elements shown in black correspond to the classes and properties defined in LRMoo, whereas those in blue indicate the
internally defined subclasses and subproperties thereof.</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>2.2.5. Modelling the Position and Sections of Texts</title>
          <p>In regard to the bibliographic and philological information discussed so far, the need is also to represent
the data about the position occupied by a text within a collection and its eventual belonging to a specific
section of the work. In this case, it was necessary to adopt a compromise solution, for at least two
reasons:
• On the one hand, property R71_has_part, which links an instance of Manifestation to its parts, at
least according to its oficial description and related examples, seems to understand “parts” not as
portions of a given volume, but only as individual publications within a set–such as a specific
volume in a trilogy–and it has been used in OntoPavese in this way.
• On the other hand, information about the position of a poem in an anthology, as well as about
the section containing it, would belong to the Expression level, as it results from the author’s
creative work. However, recording this information within the poem’s Expression would be, if not
a conceptual error, at least a representational stretch. This is because the poem can (and usually
does) originate before, and independently of the decision to include it in a specific collection, and
because it may appear in diferent collections and in diferent positions.</p>
          <p>We therefore created the class PartialEdition that represents the texts comprising a specific publication
and that allows, via the use of specific, dedicated data properties, the inclusion of information about
the page numbers in which they appear and any sections to which they belong. The Expression of a
poem included in a collection is thus published (property is embodied in) within the Manifestation of
that collection, and specifically it is edited (property hasPartialEdition) in a part (class PartialEdition) of
that volume, through which all the relevant data can be retrieved. Instances of PartialEdition are related
to the instance of F3_Manifestation they belong to via another internally defined property ( isPartOf ).</p>
        </sec>
        <sec id="sec-2-2-6">
          <title>2.2.6. Modelling the Genre of the Work</title>
          <p>One of the projects requirements is the ability to perform searches by genre (or type) within Pavese’s
works. Information about the type of a work is not explicitly handled by LRMoo. The formal
CIDOCCRM ontology provides two possible mechanisms to categorize the described objects: the addition
of subclass hierarchies to an existing, suitable class, or the use of the class E55_Type and the related
property P2_has_type. Since the concepts we intended to represent are relatively stable–as indicated in
the guideline of the CIDOC-CRM ontology–we preferred to define a system of subclasses within the
Work class. This allows us to distinguish, first of all, between private texts (letters and diary notes) and
texts intended for publication, and then to identify work types within them such as poetry, dialogue,
short story, etc. Figure 2 provides an overview of the currently defined hierarchy. 14</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Modelling Archival Information</title>
        <p>Once the philological and bibliographic levels were defined, the next step was to choose a standard
for representing archival materials. We adopted the authorative standard established by the ICA
(International Council on Archives) with RiC-CM (Records in Context Conceptual Model),15 whose latest
version, 1.0, was recently released and includes significant updates. OntoPavese incorporates various
entities from the formal RiC-o ontology, which was published alongside the conceptual model.16</p>
        <p>One of the core classes of RiC-o is RiC-E02_Record_Resource (corresponding to the Record Resource
entity in RiC-CM), which denotes, in the most general way, an archival resource, or, more precisely, its
informational content, independently of any of its physical manifestation. Within this class, three
subclasses are defined: RiC-E04_Record, which represents an informational object–Record–whose identity
derives from the object itself, and which we use to represent individual folders; RiC-E03_Record_Set,
which represents collections–Record Sets–of informational objects whose identity depends on their
members, and which we employ for describing the fonds, and the series and subseries they contain; and
RiC-E05_Record_Part, which represents parts of a Record–Record Parts–whose identity depends on the
latter, and which we adopt to describe the witnesses17 of a work found within a folder. We also rely
on the class RiC-E06_Instantiation, which designates the inscription of the informational content of a
Record Resource on a physical carrier.</p>
        <p>As previously noted, our goal was to represent perspectives on certain objects. This point becomes
clearer here. The witnesses of a work contained in a folder represent one or more drafts of that work.
However, we already reference those drafts in the ontology as distinct Expressions in the LRM model.
Presenting both archival and philological perspectives could therefore lead to a significant multiplication
of entities in the ontology: the same set of manuscript pages would have to be described both as a Record
Part, as well as an Expression (more precisely, an instance of class ExpressionDraft, see Section 2.2.3).</p>
        <p>After consulting the authors of RiC we concluded that the nature of an Expression in the LRM model
and that of a Record Resource are similar: both refer to informational content independently of the
physical form that may contain it. Likewise, Items and Instantiations describe physical objects with
comparable characteristics. It therefore seemed legitimate to consider an individual manuscript draft of
a work both as an Expression and as a Record Part, and we hence decided to define ExpressionDraft as a
14Note that, while individual diary pages belong to class PersonalText, the diary as a whole is classified as a creative work, in
accordance with the domain experts’ recommendation: in fact, Pavese had planned its publication and even gave it a title
[15, p. LXXII].
15https://www.ica.org/resource/records-in-contexts-ontology/. Last accessed 2025/08/18
16The version of the RiC-o ontology used is 1.02, available at: https://www.ica.org/standards/RiC/RiC-O_1-0-2.html. Last
accessed 2025/08/18. We are considering integrating the new version 1.1, which was recently released.
17In philology, a witness is any manuscript or printed source that transmits a text, i.e., a physical copy through which the text
has been preserved and is accessible today.
subclass of the two corresponding classes. The same applies to PhysicalDraft, that we characterized
as a subclass of the classes F4_Item and RiC-E06_Instantiation. The property hasInstantiation (see
Section 2.2.4) was then defined as a subproperty of the property hasOrHadInstantiation in RiC-o.18</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Modelling Dates</title>
        <p>In this section we illustrate our proposal to represent dates related to specific types of individuals
described in the ontology: the date when a Work was conceived;19 the date when a specific draft of a
work was created; and the date when a Manifestation was published. This task was particularly delicate,
as the available material presents a number of diferent scenarios for dates (or date intervals): some are
fully documented (day, month, and year), but more often they are partially documented (e.g., day and
month are missing), or entirely absent. Incomplete or missing dates could be be simply considered as
uncertain; however, in some cases, the editor of the work has reconstructed the missing information
with a reasonable degree of confidence.</p>
        <p>Both CIDOC-CRM and RiC include classes and properties for managing dates. Additionally, there
exists the OWL-Time ontology,20 developed by the SDWWG (Spatial Data on the Web Working Group)
for the W3C and OGC (Open Geospatial Consortium). Each of these models has its own strengths and
weaknesses. However, we chose not to explicitly incorporate any of them into our ontology. Instead, we
followed a diferent, more direct approach (see below), as it fulfills our aims, particularly in addressing
the competency questions we formulated (cf. Section 1), while avoiding unnecessary complexity. In
fact, we realized that the most efective way to represent dates and support the variety of queries we
envisioned, was to create the three dedicated classes Day, Month, and Year typing the day, month and
year items of a date, respectively. An individual of the class Date is linked to its components (day,
month, year) through specific properties, which also allow us to specify whether each component is
certain, reconstructed by the editor, or simply unknown.</p>
        <p>Works, drafts, and editions are always linked to their relevant creation-date interval by means of the
two properties hasCreationDateFrom (start of the interval) and hasCreationDateTo (end of the interval).
The case in which the creation date is not an interval is handled by simply treating the start and end of
the interval as equal objects.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Visually exploring individual relationships within OntoPavese</title>
      <p>As explained in previous sections, OntoPavese is a large-sized ontology that takes care of various
aspects of Pavese’s works. Various string processing and data retrieval tools have been devised and used
to (partially) populate the ontology, particularly in terms of class assertions and object and data property
assertions,21 where the XML files containing the TEI annotations of Pavese’s works , as well as other
textual sources, are suitably processed in order to identify and extract the relevant data items to be
inserted into the ontology, which are subsequently translated (by means of XSLT transformations and the
like) into RDF format.22 Currently, OntoPavese has been populated with 3,434 class assertions, 18,662
18To our knowledge, there is currently no attempt to harmonize the RiC model with CIDOC-CRM and LRM, and our project
thus positions itself as an initial step towards this direction.
19We deemed it reasonable to follow the recommendation of LRM, and refer to the creation date of its first corresponding</p>
      <p>Expression
20https://www.w3.org/TR/2022/CRD-owl-time-20221115/. Last accessed 2025/08/18
21For completeness, we recall here that a class assertion formally consists of a double (i.e., an ordered pair) (, ), where  is
an individual (i.e., class instance) and  is a class, whereas an object property assertion (resp., a data property assertion) is a
triple ⟨, , ⟩, where –the subject–is an individual,  –the predicate–is an object property (resp., a data property), and
–the object–is an individual (resp., a literal). Object and data property assertions are collectively called property assertions.
(For more on these notions, and other related ones used in this section, we refer to [16] and [17].) If ,  and  are the
subject, predicate and object of a property assertion, respectively, then we say that  is related to  by  . Moreover, if  and
 are individuals, then  is reachable from , if there is a sequence of individuals starting at  and ending at , and such that
any individual in the sequence is related to the next by an object property. The reachable individuals are thus precisely the
objects of (object) property assertions.
22See Section 4 for a more detailed description of the population processes of OntoPavese.
object property assertions, and 7,678 data property assertions, based on a (newly created) signature
comprising 2,984 individuals, 30 classes, 30 object properties, and 15 data properties. As the project
moves forward, these numbers are definitely expected to grow. To assist and guide users in exploring
this plethora of ontological entities, a dedicated web tool–OntoPavesePathExplorer–is being developed.
OntoPavesePathExplorer allows for an interactively queryable, visual exploration of (relationship)
paths, where a path is (in the present context) a sequence of individuals (the vertices of the path), each
related to the next by means of an object property. Vertices within a path can be explored in any order,
freely moving from one vertex to the next or vice versa. Moreover, for each individual  traversed (as a
path vertex) during exploration, OntoPavesePathExplorer provides information–precomputed and
easily accessible via the user interface–on the object properties that relate  to other individuals and, in
particular, on the depths and heights of : namely, the lengths of the longest simple paths (i.e., paths
without repeated vertices) that start and end at . Thus, after selecting a target individual of interest,
one can potentially explore all paths that start and end at that individual, while continuously accessing
positional information–depths and heights–of the vertices traversed along the way, thereby enabling a
more comprehensive exploration. This provides an easy and intuitive mechanism for readily accessing
useful structural information about the semantic organization of the ontology at the level of individual
relationships.23</p>
      <p>From a conceptual point of view, OntoPavesePathExplorer organizes the individuals of the ontology
hierarchically in a tree-like structure–the path-tree–, arranged vertically along branches of increasing
levels—corresponding to their shared depths—and grouped horizontally based on the object properties
that relate individuals at the preceding level to them. More formally, the path-tree consists of the
abstract edge-labelled tree  (over the family of individual sets), defined recursively by the following
conditions:
(1) The root of  is the set of all individuals  such that, for no property assertion ⟨, , ⟩ in the
ontology, it is the case that  =  (i.e.,  is not a reachable individual; see footnote 21).24
(2) For any node  of  , consisting of the set {1, 2, . . . , } of individuals, and any object property
 , the (possibly empty) set of all individuals  such that, for some 1 ⩽  ⩽ , ⟨, , ⟩ is a property
assertion in the ontology, is a child-node of  , connected to  by an (single) edge labelled  .
(3) No node of  has child-nodes other than those given by clause (2).</p>
      <p>Then, once a target set  of individuals is selected, OntoPavesePathExplorer constructs (using
information stored within the nodes of  ) and displays two separate collections, ′ and ′′, consisting,
respectively, of all object properties  ′ and all object properties  ′′, such that any individual ′ in  is
related to some individual ′ by  ′ (in which case we say that ′ is entailed by  via application of  ′),
and for any individual ′′ in , there exists some individual ′′ related to ′′ by  ′′ (in which case we say
that ′′ is entailed by  via inverse application of  ′′);25 and in fact, after choosing an object property
from one of these two collections, OntoPavesePathExplorer computes the corresponding set ∆( ) of
individuals that are entailed by . This process can be iterated starting from any previously entailed
set, leading to the sequence of individual sets , ∆( ), ∆(∆( )), . . . that the user hence successively
explores and that OntoPavesePathExplorer in fact visualizes along with the object properties used
during entailments. Note that these individual sets , ∆( ), ∆(∆( )), . . . can also be stored, as they
are computed, within a dedicated area, from which they can be subsequently selected by the user and
combined by means of the usual set-theoretic operations of union, intersection and complementation. The
23Note that information on properties that relate ontology individuals can be easily retrieved, e.g., by means of simple SPARQL
queries. However, SPARQL does not allow the direct retrieval of positional information.
24Observe that, if there were an individual  such that  is reachable from itself, then the root of  would have actually to
be a maximal (however chosen) set of individuals, any two of which are not reachable one from the other. (This indeed
guarantees that all individuals defined in the ontology are included within the nodes of  ). However, in the case at hand of
the ontology OntoPavese, such an individual  reachable from itself does not de facto exist, and thus the provided definition
of the root of  clearly makes it a maximal set as above.
25Note that the properties  ′ in ′ are grouped based on the depths of the individuals to which individuals  in  are related
by  ′, whereas properties  ′′ in ′′ are grouped based on the heights of the individuals that are related to  by  ′′.
combined sets can then be used in the entailment process as well. These mechanisms allow for even
complex semantic queryings on individual relationships within the ontology.26</p>
      <p>The path-tree  is built-up by using the OWL Functional-style Syntax representation (see https:
//www.w3.org/TR/owl2-syntax/) of the ontology, from which the object property assertions are extracted
ifrst, and then organized into a relation matrix which is subsequently used to construct  .</p>
      <p>Observe that the graphical interface of OntoPavesePathExplorer presents a list of individuals defined
within the ontology, searchable by IRI or label, from which the user can select the desired ones to
include in the target sets (see Figure. 3). Notice also that, besides the functionalities described above,
for any individual  selected by the user, OntoPavesePathExplorer can even visualize, upon request,
the collection A() of all complete-paths starting at , linearly arranged in consecutive rows, where a
complete-path starting at  is a maximal length sequence of property assertions such that the object
of any property assertion in the sequence equals the subject of the next property assertion, and the
subject of first property assertion is .27 In order to facilitate the view of the hierarchical structure of
vertices in these complete paths (i.e., how the subjects and objects of the property assertions are related
each other), an option is provided to visualize the entire collection A() in tree form (see Figure. 4).</p>
      <p>
        In concluding the section, it is worth mentioning that, originally, OntoPavesePathExplorer was
intended to provide a faithful tree-like visual representation of the path-tree in its whole, i.e., with all of
its nodes and edges collectively painted on screen (cif. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]); however, as the project moved forward, a
number of issues arose concerning the huge amount of data items to be fully represented, as well as,
and in particular, issues related to user interactions during set entailments, that turned out somewhat
awkward to manage visually due to the tree-like visual representation of the path-tree. The version of
OntoPavesePathExplorer described here is lighter, and more intuitive compared to the original one.28
      </p>
      <p>As a final remark, we also mention that, although OntoPavesePathExplorer has been developed for
the OntoPavese ontology, the software implementing it is actually designed to accept in input the OWL
Functional-style Syntax representation of any OWL ontology, from which OntoPavesePathExplorer is
automatically constructed. In fact, the software has also been tested on a specialized ontology, currently
under development within the COVerLeSS project (see [18, 19]), that deals with the representation of
Italian Verism terms, and for which the use of OntoPavesePathExplorer has indeed proven fruitful,
leading to the discovery of useful relationships between the terms.</p>
    </sec>
    <sec id="sec-4">
      <title>4. On the (semi-automatic) population of OntoPavese</title>
      <p>The population of OntoPavese has been carried out semi-automatically through a structured process of
extracting, transforming, and loading bibliographic data from heterogeneous sources.
26Actually, “individual vs literal” relationships are also accounted for by OntoPavesePathExplorer, where literals just are
treated as individuals, and data properties as object properties.
27By footnote 24, such maximal length sequences of property assertions are clearly well defined.
28However, the original version has not been definitively abandoned, as it may still be reconsidered and integrated with the
current one.</p>
      <p>In the first extraction phase, these data were manually extracted and normalized from a PDF
bibliographic catalog such as [20] and from Pavese’s Opera Poetica [21]. Human intervention was required to
interpret layout structures, distinguishing author names, titles, and edition details. For XML/TEI files
(letters, diaries, poems, essays), automatic XSLT transformations were used to extract relevant metadata
(such as filenames, authors, dates and, for letters, sender, recipient and place of origin). 29</p>
      <p>In the second transformation phase, the collected data from primary bibliographic entries, their
content and data extracted from XML files were (re)organized into three groups of Excel spreadsheets,
structured around seven main ontological classes (Work, Expression, Manifestation, Person, Place,
Organization and Date). Each individual is identified by a unique ID, which, in combination with the
namespace of the ontology, allows the automatic generation of an IRI that conforms to Semantic Web
standards. The population process were automated through a Python script that employs the pandas30
and RDFLib31 libraries to map (Excel) table columns to OntoPavese’s properties, distinguishing between
data properties, for literal values, and object properties for relationships between individuals.</p>
      <p>In the third loading phase, after initializing the RDF graph and loading the reference ontology,
the code performs a syntactic cleanup of IRIs (using the clean_uri function), and sets dictionaries for
mapping between table columns and RDF properties. Processing occurs systematically for each of
the seven ontology spreadsheets: each row generates a resource identified by an IRI, properties and
readable labels. The system also supports the presence of multiple relationships, allows the coexistence
of distinct types for the same resource, and incorporates control features to handle missing values. The
resulting RDF data are exported as an RDF/XML file, which is subsequently imported into GraphDB.</p>
      <p>The workflow was guided by specific methodological principles. To begin with, each resource
(including journals and collected publications) has to be represented in its entirety within the three
LRMoo levels Work, Expression, and Manifestation (unless it is included in the PartialEdition class, in
which case it is suficient to know the range of the relevant pages). Moreover, a Work needs to be
associated with a single Expression, except in those situations where textual variants (as in the case
of the two editions of Lavorare stanca of 1936 and 1943), or interpretive doubts suggest the creation
of multiple Expressions to preserve potentially relevant information. Furthermore, in order to manage
information redundancy, a suitable criterion is adopted, by which such key information such as title,
author, and language are replicated in the three LRMoo levels. Finally, in the case of works published
under diferent titles but referable to the same conceptual entity (for instance, the Pavese’s essay The
Spoon River Anthology reprinted with a new title as a part of the posthumous collection of essays La
Letteratura Americana e altri saggi), in order to safeguard the intellectual identity of the resource, it is
necessary to keep separate Expressions that refer to the same Work.</p>
      <p>Knowledge extraction is a crucial step in the construction of knowledge graphs, which typically
relies on techniques such as Named Entity Recognition, Natural Language Processing, and Machine
29This is because Pavese’s production is very extensive and, at this stage of the PAVES-e project, only part of the texts has
been encoded in XML/TEI, while OntoPavese already contains bibliographic information relating to his entire work. As a
result, some of the data needed to populate the ontology was already available in XML/TEI, while other data were collected
from various sources.
30https://pandas.pydata.org/docs/. Last accessed 2025/08/18
31https://rdflib.readthedocs.io/en/stable/. Last accessed 2025/08/18
Learning, of increasing relevance in the field of Digital Humanities [ 22, 23]. Declarative mapping
languages such as R2RML and RML ofer powerful means to integrate heterogeneous data sources into
RDF [24]; yet their applicability is limited in contexts characterized by fragmented and incrementally
expanding datasets, and they were therefore not adopted in this project. This decision was based not
only on the high heterogeneity of the source formats, ranging from XML/TEI files to bibliographic data
in PDF, but also on the need for precise domain-specific disambiguation, which could not be reliably
automated. Instead, the chosen combination of Excel, RDFLib, and XSLT supports incremental data
entry and editorial revision while avoiding the rigidity of an early fixed database structure. Although it
may introduce some risk of inconsistency, semantic coherence is ensured through reasoning and RDF
visualization tools during the post-processing stages.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>This paper presents the OntoPavese ontology, developed as part of the PAVES-e project, which focuses
on the literary production of Cesare Pavese. It also introduces the OntoPavese PathExplorer tool,
designed to explore individual relationships within the ontology. Although both the ontology and the
visualization tool are in an advanced stage of development, they are not yet fully complete. In the near
future, we plan to finalize the remaining components and test the functionality of both the ontology
and the visualization tool in practical scenarios. We also intend to engage external users in the testing
process, providing them with tailored guides and usage tips, especially for the OntoPavesePathExplorer
tool.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The PRIN 2022 PAVES-e project, developed in collaboration with the University of Catania (PI: Antonio
Sichera), the University of Turin (AI: Laura Nay) and the CNR (AI: Daria Spampinato), is funded by the
European Union – Next Generation EU, M4C2 (CUP B53D23022270006). The authors also thank the
PRIN PNRR 2022 COVerLeSS project, funded by the European Union – Next Generation EU, M4C2 (CUP
B53D23029310001). Special thanks are also due to Florence Clavaud, member of the Expert Group on
Archival Description of the International Council on Archives, for her generous assistance, and to Laura
Mazzagufo, engaged in the design of the PaveseInTesto interface with TEI Publisher, for her suggestions.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>C. D'Agata</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Del Grosso</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Nay</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Palazzolo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sichera</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spampinato</surname>
          </string-name>
          , PAVES-e:
          <article-title>Per una Hyperedizione dell'opera di Cesare Pavese</article-title>
          , in: A.
          <string-name>
            <surname>D. Silvestro</surname>
          </string-name>
          , D. Spampinato (Eds.),
          <source>AIUCD 2024 Me.Te. Digitali</source>
          .
          <article-title>Mediterraneo in rete tra testi e contesti</article-title>
          ,
          <source>Proceedings del XIII Convegno Annuale AIUCD2024</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>196</lpage>
          . doi:
          <volume>10</volume>
          .6092/unibo/amsacta/7927.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mazzagufo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cristofaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. D'Agata</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Del Grosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sichera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sichera</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <article-title>Moving towards a semantic archival edition: the paves-e project, DH2025 Book of Abstracts (in press).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pierazzo</surname>
          </string-name>
          , Digital Scholarly Editing: Theories, Models and Methods, Routledge, London New York,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .4324/9781315577227.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tomasi</surname>
          </string-name>
          ,
          <article-title>Organizzare la conoscenza: digital humanities e web semantico: un percorso tra archivi, biblioteche e musei</article-title>
          , volume
          <volume>39</volume>
          of
          <article-title>Biblioteconomia e scienza dell'informazione, Editrice Bibliografica</article-title>
          , Milano,
          <year>2022</year>
          . URL: https://doi.org/10.53134/9788893573573.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>