.

                         Semantic Label Property Graph Ontologies: A
                         Methodology for Enhanced Data Management in Digital
                         Libraries
                         Eleonora Bernasconi1,* , Stefano Ferilli1
                         1
                             University of Bari Aldo Moro, Department of Computer Science, Via Orabona 4, Bari, Italy


                                           Abstract
                                           Ontologies are crucial for managing and integrating diverse datasets in digital libraries, where data heterogeneity
                                           poses ongoing challenges. This paper presents a novel framework specifically designed to address the unique
                                           needs of digital libraries using Semantic Label Property Graphs. Our methodology aligns with semantic web
                                           standards, offering a sophisticated approach to data management that enhances integration, querying, and
                                           visualization of complex datasets. The proposed framework supports automated ontology generation, advanced
                                           semantic integration, and seamless visualization, leveraging the structural efficiency of Property Graphs with
                                           semantic annotations to optimize resource discovery, management, and retrieval. We detail the architecture
                                           and core functionalities of the framework, demonstrating its adaptability in managing complex ontologies and
                                           improving workflows for researchers and practitioners. Empirical evaluations reveal significant performance
                                           improvements in data management and linked data integration, underscoring the framework’s potential to
                                           streamline workflows and enhance semantic interoperability. This innovative approach addresses the evolving
                                           challenges of large-scale data management, positioning the framework as a valuable tool for the future of digital
                                           libraries.

                                           Keywords
                                           Digital Libraries, Semantic Ontologies, Label Property Graph, Schema Management, Semantic Web, Artificial
                                           Intelligence, Large Language Models


                         1. Introduction
                         Digital libraries are essential platforms for storing, managing, and providing access to vast collections
                         of cultural, historical, and academic resources. These collections are diverse, encompassing textual
                         documents, multimedia, complex metadata, and intricate relationships between various entities such as
                         authors, works, genres, and historical events. As these libraries continue to grow in size and complexity,
                         traditional data management systems increasingly struggle to handle the heterogeneity, scale, and
                         interconnected nature of the data, leading to significant challenges in data integration, retrieval, and
                         usability [1].
                            Digital libraries face several critical challenges that impede their ability to effectively manage,
                         integrate, and provide access to their vast collections:

                                    1. Data heterogeneity and integration: Digital libraries typically aggregate data from a multitude
                                       of sources, each employing distinct metadata standards and formats, such as Dublin Core and
                                       MARC [2]. This diversity creates significant obstacles to data integration, as the lack of a unified
                                       framework complicates efforts to harmonize these varied datasets. Consequently, seamless access
                                       and resource discovery are often hindered, affecting the overall usability of digital library systems.

                         3nd Italian Workshop on Artificial Intelligence for Cultural Heritage (IAI4CH 2024, https:// ai4ch.di.unito.it/ ), co-located with the
                         23rd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2024). 26-28 November 2024, Bolzano,
                         Italy
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                         $ eleonora.bernasconi@uniba.it (E. Bernasconi); stefano.ferilli@uniba.it (S. Ferilli)
                          0000-0003-3142-3084 (E. Bernasconi); 0000-0003-1118-0601 (S. Ferilli)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   2. Complex relationships and semantic enrichment: The relationships within digital library
      collections—such as the impact of an author on a literary genre or the historical significance of
      a work—are complex and multifaceted. Traditional keyword-based search systems frequently
      fail to capture these nuanced connections, leading to a superficial exploration of the data [3].
      There is a growing need for advanced methods that can identify, represent, and leverage these
      intricate relationships, enriching the user’s ability to explore and discover information in more
      meaningful ways.
   3. Scalability and performance: As digital libraries expand their collections, maintaining efficient
      performance in data retrieval and querying becomes increasingly challenging. The sheer volume
      and complexity of data require sophisticated storage solutions and indexing mechanisms that
      can handle large-scale, semantically rich queries. Without these, performance bottlenecks can
      severely limit the practical utility of digital libraries, particularly when dealing with extensive
      datasets.
   4. Interoperability and data reusability: The ability to easily share and reuse data across various
      platforms is essential, especially in collaborative settings involving multiple institutions, archives,
      and research bodies [4]. However, the absence of interoperable standards poses significant barriers
      to data exchange, reducing the potential of digital libraries to function as interconnected and
      accessible information hubs. Overcoming these interoperability challenges is crucial to enhancing
      the collective value and accessibility of digital library resources [5].

1.1. Role of semantic ontologies and graph databases in Digital Libraries
To address these challenges, the integration of Semantic Web technologies and graph-based data models
has emerged as a critical area of research [6]. The Semantic Web, built on standards such as RDF
(Resource Description Framework) and OWL (Web Ontology Language), aims to create a web of data
that is both machine-readable and semantically meaningful. These technologies allow digital libraries
to represent complex relationships between data points, enhancing searchability, interoperability, and
the overall user experience [7].
   Graph databases, particularly those based on the Label Property Graph (LPG) model, offer a com-
plementary approach by efficiently managing the interconnected nature of digital library data. LPGs
provide a flexible structure for modeling entities and their relationships, supporting advanced queries
and visualizations that are crucial for exploring complex datasets. The hybrid approach of combining
Semantic Web technologies with graph databases promises to overcome current limitations, offering
a powerful solution for managing, integrating, and retrieving semantically enriched data in digital
libraries [8].


2. Related work
Hybrid approaches that integrate Semantic Web technologies with graph-based data models have
become increasingly relevant in the digital library domain, enhancing data management, integration,
and retrieval capabilities. Previous research has demonstrated the advantages of combining Semantic
Web standards, such as RDF (Resource Description Framework), with graph-based models like Label
Property Graphs (LPGs) [9]. This synergy significantly improves data interoperability [10] and query-
ing functionalities [11], which are crucial for developing adaptable and efficient data management
systems [12] for digital libraries and humanities research [13].
   Nguyen et al. [14] introduced the Singleton Property Graph to add a semantic web abstraction layer
to graph databases, enabling more sophisticated data interactions. Angles et al. [15] explored methods
for mapping RDF to property graph databases, enhancing the flexibility and utility of hybrid systems.
Hristovski et al. [16] demonstrated the practical application of these approaches in knowledge discovery
by implementing semantic literature-based discovery using graph databases.
   Graph databases such as Neo4j1 , ArangoDB2 , and Amazon Neptune3 have incorporated RDF and LPG
capabilities, supporting the semantic integration of diverse datasets [17]. These platforms allow digital
libraries to capture the semantic relationships between resources [18], enhancing the searchability and
discoverability of content [19]. However, existing solutions often face challenges related to scalability,
standardization, and performance when handling large-scale, semantically enriched data, which can
limit their effectiveness in practical digital library applications.

2.1. Gaps in existing research
Despite the progress made in integrating Semantic Web technologies and graph databases in digital
libraries, several critical gaps remain unaddressed. One of the key challenges is the limited adoption of
hybrid models that combine RDF (Resource Description Framework) and Label Property Graphs (LPGs).
Although these hybrid models offer significant advantages in terms of data integration and semantic
enrichment, their implementation is still relatively rare in digital libraries. This limited adoption is
largely due to technical challenges, such as the complexity of integrating these technologies [20], and
the lack of standardized tools [21] and frameworks that can facilitate their widespread use.
   Scalability also remains a significant issue in the current landscape [22]. As digital libraries continue
to grow, both in terms of the size and complexity of their datasets, existing solutions often struggle
with performance bottlenecks, particularly in areas like querying and data storage. These bottlenecks
can significantly hinder the practical implementation of RDF and LPG-based systems in large-scale
digital libraries, limiting their ability to efficiently handle the large volumes of interconnected data that
these institutions manage.
   Moreover, there is a pressing need for more robust semantic enrichment tools that can automatically
generate and manage semantic annotations within graph-based models. The current lack of sophisticated
tools in this area hampers the ability to create richer data connections and enhance user interactions with
library content. Such tools are essential for facilitating deeper exploration of digital library collections,
allowing users to navigate complex relationships and discover new connections within the data.
   This study is motivated by the need to develop and refine methodologies that leverage the strengths
of Semantic Web technologies and graph databases to address the ongoing challenges faced by digital
libraries. By creating a framework that integrates SLPGs, this research aims to favour the management
of complex datasets, improve semantic interoperability, and optimize the user experience for researchers,
students, and practitioners in the field of cultural heritage and beyond.


3. Methodology
This research utilizes the OntoBuilder tool [23] to construct and automatically populate an ontology
tailored for the digital library domain. The methodology consists of several key phases:
   The process begins with constructing the SLPG ontology schema, where entities (e.g., books, authors,
genres) and their relationships (e.g., authoring, publication, thematic links) are defined. Each node
represents an entity, while edges define relationships, and both can have associated properties. This
structure is flexible, enabling the modeling of complex and heterogeneous data in digital libraries.
   Semantic web standards such as RDF, RDF Schema (RDFS), and OWL (Web Ontology Language)
are integrated within the graph-based framework. This allows the representation of rich metadata,
enhancing the system’s ability to understand and manage relationships between entities. RDF triples
are mapped into the LPG structure, enriching nodes and edges with semantic meaning that aligns with
global standards for data interchange and reuse.
   Concepts and relationships are extracted from structured and unstructured data sources within the
digital library. Using predefined rules and algorithms, entities like book titles, author names, and

1
  https://neo4j.com
2
  https://arangodb.com
3
  https://aws.amazon.com/it/neptune
publication dates are automatically identified and incorporated into the SLPG. This automation reduces
the manual effort required to build ontologies while ensuring consistency and accuracy in the ontology
creation process.
   Once the SLPG is constructed, it supports advanced querying capabilities through graph traversal
techniques combined with semantic reasoning. Users can query the ontology to retrieve complex infor-
mation, such as identifying all works by a particular author within a specified timeframe or discovering
thematic connections across different collections. The LPG’s inherent flexibility in managing relation-
ships allows for efficient querying, while semantic annotations ensure the relevance and precision of
search results.
   The methodology supports continuous evolution and scaling of the ontology. As new data is ingested
into the digital library, the ontology can be updated dynamically, allowing for the seamless integration
of additional metadata and relationships. This adaptability ensures that the ontology remains relevant
as the digital library grows and evolves over time.
   By adhering to RDF and other semantic web standards, the SLPG-based ontologies ensure that data
can be shared and integrated across different systems. This interoperability is crucial for digital libraries
that rely on external data sources, such as linked open data initiatives, to enrich their collections. The
ability to interlink resources across various libraries and cultural heritage institutions enhances the
discoverability and usability of digital assets.
   The final SLPG ontology can be exported into RDF format for integration with other semantic web
technologies or maintained in the LPG format (e.g., Neo4j) to optimize graph-specific features and
performance. This flexibility allows digital libraries to choose the most appropriate format based on
their specific needs, balancing semantic richness with operational efficiency.
   This methodology provides a robust and flexible approach for enhancing data management within
digital libraries, offering improvements in metadata organization, search functionality, and interop-
erability. By integrating SLPGs with semantic web standards, it enables more efficient handling of
complex datasets, addressing the challenges of scalability, data integration, and advanced querying in
modern digital libraries.

3.1. Use case
In the evolving landscape of digital libraries, effective organization and retrieval of information hinge on
the development of robust ontologies. In this section, we outline a systematic approach to constructing
an ontology that captures essential entities within a digital library framework, such as books, authors,
topics, publishers, locations, and contributors. Using existing semantic resources, this process not only
enhances the richness of the data, but also facilitates better user interactions and knowledge discovery.
A key feature of our approach is the automatic population of the ontology from textual documents.
This allows for the seamless extraction and categorization of information, ensuring that the ontology
remains up-to-date and reflective of the latest publications and research. The following steps detail
the approach, beginning with the integration of DBpedia as a foundational reference for creating our
ontology.

3.1.1. Using DBpedia for guided creation
The first step is to use guided creation using DBpedia as a reference. For instance, we select the class
Book as the base entity. From DBpedia, we incorporate several properties that describe books, including:

    • comment: A brief description or annotation of the book.
    • author: The name of the person who wrote the book.
    • subject: The main topics or themes addressed in the book.
    • publisher: The company or organization responsible for publishing the book.
    • publicationDate: The date when the book was first published.
    • isbn: The International Standard Book Number, a unique identifier for books.
  Additionally, we define custom properties specific to the digital library, such as:
    • bookStatus: Indicates the current availability status of the book (e.g., available, checked out).
    • editingContributors: Names of the individuals involved in the editing process of the book.
   Once the class Book is created with these properties, it is named and linked to both the DBpedia
resources and the internal digital library resources. DBpedia properties maintain links to external URIs,
while custom properties are linked to specific URIs representing the internal reference environment of
the digital library.

3.1.2. Manual creation example
To describe the manual creation process, consider the book “The Name of the Rose” by Umberto Eco.
The following details how an instance of the class Book would be created and populated:
  Creation of the class Book
    • Name: Book
    • DBpedia Properties:
         – comment: A description of the book.
         – author: Umberto Eco.
         – subject: Historical novel, Mystery.
         – publisher: Bompiani.
         – publicationDate: 1980.
         – isbn: 978-88-452-1523-5.
    • Custom Properties:
         – bookStatus: Available.
         – editingContributors: Maria Bonfantini.
  Manual insertion of properties
    • comment: “The Name of the Rose is a historical novel and mystery written by Umberto Eco, set
      in a Benedictine monastery in the 14th century.”
    • author: Umberto Eco.
    • subject: Historical novel, Mystery.
    • publisher: Bompiani.
    • publicationDate: 1980.
    • isbn: 978-88-452-1523-5.
    • bookStatus: Available.
    • editingContributors: Maria Bonfantini.

3.1.3. Automatic creation example
During the ontology population phase, instances of classes can be added either manually or automatically.
In the manual mode, users input values for class properties, as demonstrated above. In the automatic
mode, an advanced language model is employed to process text, identifying and categorizing entities
and properties. For example, given the text:

      “The Name of the Rose, written by Umberto Eco, is a historical novel published by Bompiani
      in 1980. The book deals with themes such as faith, truth, and heresy and is available in our
      library.”

   In automatic insertion mode, the language model parses the text, identifying “The Name of the Rose”
as a book, extracting the author “Umberto Eco,” the publisher “Bompiani,” the publication date “1980,”
and other relevant information. This information is then used to automatically create class instances in
the ontology.
3.1.4. Linking resources and visualization
Once properties are associated with DBpedia and custom resources, the ontology is enriched with
semantic data. This enriched data can be visualized using graph-based tools [24]. For example, the
SKATEBOARD interface [25] can be used to visualise the connections between books, authors, and
other entities, allowing dynamic exploration and further semantic augmentation of the ontology (see
Figure 1).


Figure 1: SKATEBOARD Interface showing ontological relationships.


4. Conclusion
This study highlights the importance of integrating Semantic Web technologies and graph databases,
specifically through Semantic Label Property Graphs, to enhance data management in digital libraries. By
addressing challenges such as data heterogeneity, complex relationships, scalability, and interoperability,
our methodology demonstrates improved organization and retrieval of diverse datasets.
  An empirical evaluation of the methodology applied, as detailed in [23], reveals that the proposed
methodology significantly enhances ontology management and linked data integration. The results
indicate strong user satisfaction, particularly in areas like integration compatibility and ontology
representation accuracy, underscoring the framework’s effectiveness in real-world applications.
  The case study illustrates how our framework enables efficient ontology creation and automated
metadata integration, fostering richer user interactions. The ability to dynamically update ontologies
ensures ongoing relevance and usability.
  Future research can focus on enhancing semantic enrichment tools and addressing scalability chal-
lenges to further improve user experience and collaboration among institutions. Overall, our work
contributes both theoretical insights and practical methodologies for optimizing digital library manage-
ment, positioning libraries to better serve cultural and academic resources.


Acknowledgments
This research was partially supported by projects CHANGES “Cultural Heritage Active Innovation
for Sustainable Society” (PE00000020), Spoke 3 “Digital Libraries, Archives and Philology” and FAIR
“Future AI Research” (PE00000013), spoke 6 “Symbiotic AI”, funded by the Italian Ministry of University
and Research NRRP initiatives under the NextGenerationEU program.
References
 [1] O. Diseiye, S. E. Ukubeyinje, B. D. Oladokun, V. V. Kakwagh, Emerging technologies: Leveraging
     digital literacy for self-sufficiency among library professionals, Metaverse Basic and Applied
     Research 3 (2024) 59–59.
 [2] J. V. Krefft, Z. Du, R. Bakker, From dublin core to marc-crosswalking etd metadata from digital
     commons to the library catalog (2020).
 [3] H. Lee, S. Yoon, Z. Park, “semantic” in a digital curation model, Journal of Data and Information
     Science 5 (2020) 81–92.
 [4] K. Shahzad, S. A. Khan, Factors affecting the adoption of integrated semantic digital libraries (sdls):
     a systematic review, Library Hi Tech 41 (2023) 386–412.
 [5] P. Fafalios, K. Petrakis, G. Samaritakis, K. Doerr, A. Kritsotaki, Y. Tzitzikas, M. Doerr, Fast cat:
     collaborative data entry and curation for semantic interoperability in digital humanities, Journal
     on Computing and Cultural Heritage (JOCCH) 14 (2021) 1–20.
 [6] B. O’Neill, L. Stapleton, Digital cultural heritage standards: from silo to semantic web, Ai &
     Society 37 (2022) 891–903.
 [7] S. Auer, A. Oelen, M. Haris, M. Stocker, J. D’Souza, K. E. Farfar, L. Vogt, M. Prinz, V. Wiens, M. Y.
     Jaradeh, Improving access to scientific literature with knowledge graphs, Bibliothek Forschung
     und Praxis 44 (2020) 516–529.
 [8] E. Bernasconi, M. Ceriani, S. Ferilli, Lpg semantic ontologies: A tool for interoperable schema
     creation and management, Information 15 (2024) 565.
 [9] H. Moon, Z. Zhao, J. Choi, S. Han, A novel property graph model for knowledge representation
     on the web, International Journal of Engineering & Technology 7 (2018) 187–190.
[10] S. Ferilli, R. Basili, F. Esposito, Hybrid approaches to semantic data management, Journal of Data
     Semantics 12 (2023) 123–145.
[11] T. Liebig, M. Opitz, V. Vialard, M. Wenzel, Scalable no-code knowledge graph exploration and
     querying with semspect., in: SEMANTiCS (Posters & Demos), 2023.
[12] S. Ferilli, E. Bernasconi, D. Di Pierro, D. Redavid, A graph db-based solution for semantic tech-
     nologies in the future internet, Future Internet 15 (2023) 345.
[13] D. Di Pierro, S. Ferilli, D. Redavid, Lpg-based knowledge graphs: A survey, a proposal and current
     trends, Information 14 (2023) 154.
[14] V. Nguyen, H. Y. Yip, H. Thakkar, Q. Li, E. Bolton, O. Bodenreider, Singleton property graph:
     Adding a semantic web abstraction layer to graph databases., BlockSW/CKG@ ISWC 2599 (2019)
     1–13.
[15] R. Angles, H. Thakkar, D. Tomaszuk, Mapping rdf databases to property graph databases, IEEE
     Access 8 (2020) 86091–86110.
[16] D. Hristovski, A. Kastrin, D. Dinevski, T. C. Rindflesch, Towards implementing semantic literature-
     based discovery with a graph database, DBKDA 2015 (2015) 190.
[17] D. Fernandes, J. Bernardino, et al., Graph databases comparison: Allegrograph, arangodb, infinite-
     graph, neo4j, and orientdb., Data 10 (2018) 0006910203730380.
[18] J. J. Miller, Graph database applications and concepts with neo4j, in: Proceedings of the southern
     association for information systems conference, Atlanta, GA, USA, volume 2324, 2013, pp. 141–147.
[19] J. Cheng, Graph feature management: Impact, challenges and opportunities, in: Proceedings
     of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and
     Network Data Analytics (NDA), 2023, pp. 1–1.
[20] S. Khayatbashi, S. Ferrada, O. Hartig, Converting property graphs to rdf: a preliminary study
     of the practical impact of different mappings, in: Proceedings of the 5th ACM SIGMOD Joint
     International Workshop on Graph Data Management Experiences & Systems (GRADES) and
     Network Data Analytics (NDA), 2022, pp. 1–9.
[21] H. V. Thakker, On Supporting Interoperability between RDF and Property Graph Databases, Ph.D.
     thesis, Universitäts-und Landesbibliothek Bonn, 2021.
[22] E. Iglesias, M.-E. Vidal, D. Collarana, D. Chaves-Fraga, Empowering the sdm-rdfizer tool for scaling
     up to complex knowledge graph creation pipelines 1, Semantic Web (2024) 1–28.
[23] E. Bernasconi, M. Ceriani, S. Ferilli, Lpg semantic ontologies: A tool for interoperable schema
     creation and management, Information 15 (2024). URL: https://www.mdpi.com/2078-2489/15/9/565.
     doi:10.3390/info15090565.
[24] E. Bernasconi, M. Ceriani, D. Di Pierro, S. Ferilli, D. Redavid, Linked data interfaces: A survey, Infor-
     mation 14 (2023). URL: https://www.mdpi.com/2078-2489/14/9/483. doi:10.3390/info14090483.
[25] E. Bernasconi, D. Di Pierro, D. Redavid, S. Ferilli, Skateboard: Semantic knowledge advanced tool
     for extraction, browsing, organisation, annotation, retrieval, and discovery, Applied Sciences 13
     (2023). URL: https://www.mdpi.com/2076-3417/13/21/11782. doi:10.3390/app132111782.