Towards ShowVoc: dataset publication and browsing Armando Stellato1,2,* ,†, Manuel Fiorelli1,2,†, Tiziano Lorenzetti2,†and Andrea Turbati2,† 1Tor Vergata University of Rome, Italy 2Lore Star s.r.l., Rome, Italy Abstract ShowVoc is a web-based, multilingual, platform for publication and consumption of datasets complying with Semantic Web standards. Born in the context of the ISA2 European programme for the development of digital solutions for interoperable cross-border and cross-sector public services, ShowVoc aims at providing a one-stop shop for maximizing the diffusion of semantic and lexical resources as Linked Open Data. To this end, ShowVoc combines traditional data provisioning following LOD policies with global activities (e.g. global search, navigation of dataset relationships/alignments, translation API benefiting from multilingual datasets and linksets). A rich dataset browsing interface provides dedicated support for diverse data models: OWL ontologies, SKOS/SKOS-XL thesauri, OntoLex-Lemon lexicons and generic RDF datasets and linkage possibilities (EDOAL, XKOS). A metadata registry completes the offer combining different metadata vocabularies into an advanced catalog that can be inspected through a convenient user interface and LOD best practices. Finally, ShowVoc is an ideal companion to VocBench, a popular collaborative editing environment for Semantic Web resources, complementing it for realizing an entire workflow embracing all stages of a dataset life, from realization and maintenance, to release and publication. Keywords Semantic Web, Linked Open Data, Dataset Catalogs, Metadata repositories, Data consumption 1 1. Introduction seriously, then we should consider people's reliance on search engines as an entry point to the Web. The Semantic Web [1], which is being built according Although semantic web search engines are not as to Linked Data [2] best practices, is based on the common as they could be, there has been a decentralized publication of disparate but interlinked proliferation of dataset catalogs, both in specific datasets that together form a huge global graph. domains and across the web, which play a similar role. Although resolvable URIs and query-through- In this paper, we present ShowVoc, a platform for discovery are the defining access mechanism for a dataset publication and exploitation, which addresses machine-accessible Web and the focus is on linking both needs: it allows for the publication of datasets records, there is still a need, especially for humans, for with resolvable URIs and a more sophisticated a coarse-grained perspective made of browsing, browsing experience than simple subject pages while querying and visualization capabilities over the offering a fully-fledged data portal for linked datasets. published resources. ShowVoc can be seen as a companion to VocBench 3 Discovery by link traversal - a la “follow your [3], a platform for dataset development and nose” - is closely related to people surfing the Web in maintenance, inheriting many of its features, such as search of information. If we take this analogy its advanced multi-model support. However, while Ital-IA 2024: 4th National Conference on Artificial Intelligence, 0000-0001-5374-2807 (A. Stellato); 0000-0001-7079-8941 organized by CINI, May 29-30, 2024, Naples, Italy (M. Fiorelli); 0000-0001-5676-8877 (T. Lorenzetti); 0000-0002- ∗ Corresponding author. 6214-4099 (A. Turbati) † These authors contributed equally. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). stellato@uniroma2.it (A. Stellato); manuel.fiorelli@uniroma2.it (M. Fiorelli); tiziano.lorenzetti@lorestar.it (T. Lorenzetti); andrea.turbati@lorestar.it (A. Turbati); CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings most of the operations in VocBench 3 deal with Alignment management, which is addressed by individual datasets, ShowVoc adds a number of cross- many Semantic Web catalogs, including LOV and dataset operations that rely on managing multiple OntoPortal, can also be a use case in its own right. For datasets. These include global search, translation and example, the Alignment API [11] ships with a server alignment management, which are based on the idea that can handle an ontology network, with the ability that multiple datasets contribute to a sort of giant to compute, retrieve, combine, and otherwise virtual reference for terminology and translation. manipulate alignments between ontologies. In a ShowVoc is open source and made available under related vein, the ELEXIS [12] project aims at linking the BSD-3-Clause license. The project official web site (legacy) language resources via linked data, and has is https://showvoc.uniroma2.it/. Source code and developed a standard REST API for accessing a catalog deployment artifacts are hosted on Bitbucket at of dictionaries. Both of these applications are covered https://bitbucket.org/art-uniroma2/showvoc. by ShowVoc, as we will see later. The paper is structured as follows. Section 2 We conclude the section on related work by discusses related work. Section 3 briefly describes the discussing the publication of linked data. Pubby7 architecture of the ShowVoc. Section 4 delves into the implements resolvable URIs by querying a SPARQL main features of ShowVoc. Then, Section 5 argues for endpoint. This software is now discontinued, but its impact. Finally. Section 6 draws conclusions. newer alternatives such as LodView8 and Loddy9 have emerged. The triple store Virtuoso [13] has even 2. Related work integrated this feature without the need for third- party software. Subject pages were even took as a We should probably start our discussion of related paradigm for data editing, in systems such as efforts on data portals with CKAN2, which had TemaTres [14] or OntoWiki [15]. established itself as a de facto standard, particularly in Subject pages are not always the best choice for the public sector, with its rich API and support for browsing through your data. For example, SKOSMOS10 federation of catalogs. Within the scientific became a popular choice for publishing a collection of community, Zenodo3 (based on the open-source SKOS thesauri with more sophisticated browsing software Invenio4) has established itself as the go-to capabilities, including search and indexing. For solution for ensuring data persistence, similar to what ontologies, the need for more organized arXiv5 has achieved for preprint publication. documentation became apparent. This can be Regarding the impact of archiving, the Open Archive automatically generated from the ontology definitions Initiative [4] (OAI) is certainly of interest, especially themselves using tools such as LODE [16] or, more for its metadata harvesting protocol (OAI-PMH). recently, WIDOCO [17]. This feature has also been None of these solutions are specifically tailored to developed within VocBench 3 using its custom semantic web datasets, beyond the ability to store reporting facility. In fact, both browsing tools and dumps as files. For this reason, we now consider documentation pages can be used to resolve URIs and catalogs of semantic web datasets. LOV [5] is a catalog both use cases are supported by ShowVoc. of Linked Data Vocabularies, while LOD Cloud6 hosts, in addition to the eponymous figure, a catalog of the datasets that actually drove the creation of the former. 3. Architecture There are also domain-specific catalogs, most notably ShowVoc has been designed as a single-page BioPortal [6] for ontologies related to the biomedical application (SPA), with a frontend running inside a domain. Today, the OntoPortal Alliance [7] has taken web browser that communicates with a back-end over BioPortal's original source code, which is being server through a REST-like API. adopted by portals across various domains, such as The frontend is developed in TypeScript using the agrifood [8] and biodiversity and ecology [9]. Within Angular framework and can be delivered to users by the field of solid Earth science, we mention a any web server or CDN (Content Delivery Network). European initiative [10] using metadata and semantic The backend server is based on Semantic Turkey technologies for integration and access of data from [18], the same RDF services platform that powers diverse sources. VocBench 3. The platform, based on an opinionated 2 https://ckan.org/ 7 https://github.com/cygri/pubby 3 https://zenodo.org/ 8 https://github.com/LodLive/LodView 4 https://inveniosoftware.org/ 9 https://bitbucket.org/art-uniroma2/loddy 5 https://arxiv.org/ 10 https://skosmos.org/ 6 https://lod-cloud.net/ combination of the Spring Boot11 and PF4J12 that entity identifiers be resolvable via HTTP, which is frameworks, supports the development and perhaps the defining characteristic of the Linked Data publication of services related to RDF data. Prebuilt paradigm. ShowVoc supports this as well, with some services address multiple models and various endpoints that can be queried by a reverse proxy concerns such as history, validation, and associated with the domain to implement content import/export. PF4J makes it easy to deploy new negotiation and generate different variants, including services and extend the capabilities of existing ones by machine-readable serializations and a human- providing implementations of the extension points on friendly page. which they depend. For example, the export service Multi-model support. ShowVoc inherits from defines extension points that can be used to provide Semantic Turkey the ability to manage arbitrary RDF both the conversion logic to a particular serialization datasets, coupled with convenient facilities for OWL format (i.e., reformatting exporter) and the ability to ontologies and other less formal Knowledge deploy data to particular targets (i.e., deployer). Organization Systems (KOS) modeled in SKOS, as well Semantic Turkey ships with implementations of these as OntoLex/Lemon lexicons. In addition, ShowVoc is extension points for common use cases, but (as aware of various lexicalization models for grounding mentioned) new ones can be added to the system. data in natural language, including RDFS, SKOS(-XL), Semantic Turkey relies on the RDF4J framework and OntoLex-Lemon. to process RDF data and interact with triple stores At the user interface level, this flexibility is first (i.e., RDF database management systems), both in- visible in ShowVoc's resource view, which can display process within Semantic Turkey or managed as the description of any resource, divided into sections separate processes. The latter option is the preferred that roughly correspond to different properties. As method, allowing the use of enterprise-grade triple such, the resource view can display any type of stores such as Ontotext ‘s GraphDB13. resource, but it can be specialized and made efficient VocBench 3 and ShowVoc can share the same for specific modeling vocabularies through a backend server, with a common set of projects that combination of customized templates (defining the can be conveniently made accessible through prominent sections for different resource types), ShowVoc. However, the common, recommended specialized sections (e.g., the one grouping class scenario is to have separate backend servers (and, axioms), as well as dedicated support for specific most important, different storage solutions with mechanisms (e.g., proper rendering of class axioms in different expected workloads) for ShowVoc and Manchester syntax). ShowVoc works seamlessly with VocBench 3 so that managers of projects developed different lexicalization models, which are taken into within VocBench can submit datasets to the ShowVoc account when selecting the "labels" for displaying a instance for publication [19]. resource (instead of its IRI or qname) or when populating the "lexicalizations" section of the 4. Features resource view (which abstracts over the specific lexicalization model). We will introduce here the main features of ShowVoc ShowVoc also provides a number of views to and discuss their relevance to the system's use cases browse the content of the dataset, depending on its (see Figure 1 for an overview of its UI). nature, such as a class tree, instance list, property tree, Contributions. A ShowVoc dataset portal can concept tree, etc. optionally allow contributions from visitors. These Seamless navigation between local and remote can request the addition of a new dataset, possibly datasets. A user who encounters a reference to a after conversion from a non-RDF format, and the resource outside the dataset being browsed can easily creation of a development environment within an jump to it. If the resource belongs to another dataset associated VocBench instance. in the same ShowVoc installation, the user interface Content negotiation. ShowVoc's use cases extend automatically switches to that dataset and focuses on beyond cataloging third-party datasets, as it also the target resource. External resources can also be addresses the needs of original dataset publishers. displayed in a modal dialog populated with These can set up ShowVoc as an advanced browser for information retrieved by deferecentiation or a their datasets; however, Linked Data rules require SPARQL endpoint, if known to the system. 11 https://spring.io/projects/spring-boot 13 https://www.ontotext.com/products/graphdb/ 12 https://pf4j.org/ Figure 1: ShowVoc data view displaying the ELI (European Legislation Identifier) ontology. SPARQL querying. ShowVoc provides a SPARQL ShowVoc manages the dataset descriptions using editor with syntax highlighting and completion that the Metadata Registry (MDR) component of Semantic can be used to query individual datasets. It also Turkey. This in turn manages the available datasets as supports federated queries involving other hosted a DCAT [21] catalog, using a metadata profile based datasets or remote SPARQL endpoints. The former on a combination and interpretation of existing can be done more efficiently if the chosen triple store metadata vocabularies (e.g. DCTERMS, FOAF, VoID is GraphDB, which has specific optimizations for local [22], LIME [23]) together with a small ontology federation. addressing concerns (mostly related to access) not Results can be downloaded in a variety of formats, covered by the former. and queries can be loaded from different scopes, e.g., Distributions. ShowVoc maintains multiple system scope for general purpose queries or dataset- distributions of each dataset. In the first place, these specific queries. distributions can be used to provide a downloadable Dataset metadata. Proper metadata is considered data dump of a (given version) of the dataset. critical for publishing a dataset according to the FAIR However, they also include any additional files, principles [20]. As such, ShowVoc manages a complete including documentation. description for each dataset, including general Global Search. ShowVoc allows users to perform a metadata (e.g., title), customizable facets (e.g., full-text search across all hosted datasets. Matched category, organization), access metadata (e.g., entities, grouped by dataset, are displayed with their SPARQL endpoint), structural metadata (e.g., URI labels, IRIs, and skos:note specializations (which space), and various metrics that provide insight into include definitions, examples, etc.). the richness of the available content both at the conceptual level (e.g., number of classes, concepts, Translation. Similar to global search, this feature etc.) and at the lexical level (i.e., regarding the degree allows users to search lookup a term in a given natural of coverage of different natural languages). These language inside the datasets in the catalog, searching metrics can be visualized as a table or as a chart of for a translation in one or more natural languages. various types. Alignments. ShowVoc keeps track of the alignments modeling vocabularies (e.g. XKOS for statistical contained in each dataset, providing both a per- classifications), improving the publication workflow dataset and a global view of these alignments. from VocBench to ShowVoc and further exploiting its The former consists of an expanding tree whose linking metadata to broaden its possibilities as a roots are the datasets directly aligned with the authority-based translation system. current datasets. These can be expanded to show the datasets to which they are aligned, and thus to which Acknowledgements the current dataset is indirectly aligned. Each node is decorated by the number of links: when one of them ShowVoc has been originally designed by Tor Vergata is clicked, ShowVoc displays a paginated list of the University of Rome and is now maintained and correspondences (with some filtering features). evolved by Lore Star srl in the context of the Digital Global visualization of alignments is supported by Europe Programme, under management of the a similar tree view as well as by graph visualization: Publications Office of the EU in a provision contract nodes represent datasets and edges represent with European Dynamics. alignments. By clicking on a node or an edge, users can view metadata about a dataset or an alignment. References [1] T. Berners-Lee, J. A. Hendler, and O. Lassila, 5. Impact "The Semantic Web: A new form of Web First released in September 2021, ShowVoc is content that is meaningful to computers will unleash a revolution of new possibilities," younger than its editing companion VocBench 3, Scientific American, vol. 284, no. 5, pp. 34-43, which has become a reference platform since its 2001, doi: 10.1038/scientificamerican0501- launch in September 2017. Despite ShowVoc’s 34 . relatively short history, we can point to some notable [2] T. Berners-Lee. (2006) Design Issues. adopters. The Food and Agriculture Organization [Online]. Available: (FAO) of the United Nations (UN) adopted ShowVoc https://www.w3.org/DesignIssues/LinkedD for the Caliper portal14, which publishes statistical ata.html classifications as linked data. The Italian branch of [3] A. Stellato, M. Fiorelli, A. Turbati, T. LifeWatch ERIC – the European Research Lorenzetti, W. van Gemert, D. Dechandon, C. Infrastructure Consortium for biodiversity and Laaboudi-Spoiden, A. Gerencsér, A. Waniart, ecology – used ShowVoc in addition to OntoPortal as E. Costetchi, and J. Keizer, "VocBench 3: A a data publication platform supporting resolvable collaborative Semantic Web editor for URIs. Last but not least, the Publications Office (OP) of ontologies, thesauri and lexicons," Semantic the European Union (EU), which managed the Web, vol. 11, no. 5, pp. 855-881, Jan 2020, doi: development of the system, has deployed an instance 10.3233/SW-200370 . of ShowVoc15 “to support interested teams and [4] C. Lagoze and H. Van de Sompel, "The open professionals working for the EU institutions and archives initiative: building a low-barrier agencies”. The Publications Office also integrated interoperability framework," in Proceedings ShowVoc into the EU Vocabularies Portal16 to provide of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, Roanoke Virginia USA, June an "advanced view" of the datasets content, 24-28, 2001, 2001, pp. 54-62, doi: complementing the portal's own capabilities. 10.1145/379437.379449 . [5] P.-Y. Vandenbussche, G. A. Atemezing, M. 6. Discussion and conclusion Poveda-Villalón, and B. Vatant, "Linked Open In this work, we have introduced ShowVoc, a web- Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web," Semantic based multilingual platform for publishing and Web, vol. 8, no. 3, pp. 437-452, December consulting OWL ontologies, SKOS(-XL) thesauri, 2016, doi: 10.3233/SW-160213 . Ontolex-lemon lexicons and generic RDF datasets. Its [6] M. Salvadores, P. R. Alexander, M. A. Musen, features and impact on the world of linked open and N. F. Noy, "BioPortal as a dataset of linked datasets have been discussed. Future work includes biomedical ontologies and terminologies in further broadening the dedicated support for core 14 https://www.fao.org/statistics/caliper/en 16 https://op.europa.eu/en/web/eu-vocabularies 15 https://showvoc.op.europa.eu RDF," Semantic Web, vol. 4, no. 3, pp. 277-284, [15] P. Frischmuth, M. Martin, S. Tramp, T. 2013, doi: 10.3233/SW-2012-0086 . Riechert, and S. Auer, "OntoWiki – An [7] C. Jonquet, J. Graybeal, S. Bouazzouni, M. Dorf, authoring, publication and visualization N. Fiore, X. Kechagioglou, T. Redmond, I. interface for the Data Web," Semantic Web, Rosati, A. Skrenchuk, J. L. Vendetti, M. Musen, vol. 6, no. 3, pp. 215-240, 2015, doi: and m. o.t.O. Alliance, "Ontology Repositories 10.3233/SW-140145 . and Semantic Artefact Catalogues with the [16] S. Peroni, D. Shotton, and F. Vitali, "Making OntoPortal Technology," in The Semantic Web Ontology Documentation with LODE," in – ISWC 2023 (Lecture Notes in Computer Proceedings of the I-SEMANTICS 2012 Posters Science), T. R. Payne et al., Eds.: Springer, & Demonstrations Track, Graz, Austria, Cham, 2023, vol. 14266, pp. 38-58, doi: September 5-7, 2012, 2012, pp. 63-67. 10.1007/978-3-031-47243-5_3 . [Online]. Available: https://ceur-ws.org/Vol- [8] C. Jonquet, A. Toulet, E. Arnaud, S. Aubin, E. 932/paper12.pdf Dzalé Yeumo, V. Emonet, J. Graybeal, M.-A. [17] D. Garijo, "WIDOCO: A Wizard for Laporte, M. A. Musen, V. Pesce, and P. Documenting Ontologies," in The Semantic Larmande, "AgroPortal: A vocabulary and Web – ISWC 2017. ISWC 2017 (Lecture Notes ontology repository for agronomy," in Computer Science), vol. 10588, 2017, pp. Computers and Electronics in Agriculture, vol. 94-102, doi: https://doi.org/10.1007/978-3- 144, pp. 126-143, 2018, doi: 319-68204-4_9 . 10.1016/j.compag.2017.10.012 . [18] M. T. Pazienza, N. Scarpato, A. Stellato, and A. [9] X. Kechagioglou, L. Vaira, P. Tomassino, N. Turbati, "Semantic Turkey: A Browser- Fiore, A. Basset, and I. Rosati, "EcoPortal: An Integrated Environment for Knowledge Environment for FAIR Semantic Resources in Acquisition and Management," Semantic Web the Ecological Domain," in Proceedings of the Journal, vol. 3, no. 3, pp. 279-292, 2012, doi: Joint Ontology Workshops 2021. Episode VII: 10.3233/SW-2011-0033 . The Bolzano Summer of Knowledge. co-located [19] M. Fiorelli, A. Stellato, I. Rosati, and N. Fiore, with the 12th International Conference on "Process-Level Integration for Linked Open Formal Ontology in Information Systems, and Data Development Workflows: A Case Study," the 12th International Conference on in Metadata and Semantic Research Biomedical Ontologies, vol. 2969, 2021. (Communications in Computer and [Online]. Available: https://ceur-ws.org/Vol- Information Science), E. Garoufallou and A. 2969/paper6-s4biodiv.pdf Vlachidis, Eds.: Springer, Cham, 2023, vol. [10] D. Bailo, R. Paciello, J. Michalek, M. Cocco, C. 1789, pp. 148-159, doi: 10.1007/978-3-031- Freda, K. Jeffery, and K. Atakan, "The EPOS 39141-5_13 . multi-disciplinary Data Portal for integrated [20] M. D. Wilkinson, et al., "The FAIR Guiding access to solid Earth science datasets," Principles for scientific data management and Scientific Data, vol. 10, 2023, doi: stewardship," Scientific Data, vol. 3, no. 10.1038/s41597-023-02697-9 . 160018, 2016, doi: 10.1038/sdata.2016.18 . [11] J. David, J. Euzenat, F. Scharffe, and C. Trojahn [21] R. Albertoni, D. Browning, S. Cox, A. N. dos Santos, "The Alignment API 4.0," Gonzalez-Beltran, A. Perego, and P. Semantic Web Journal, vol. 2, no. 1, pp. 3-10, Winstanley, "The W3C Data Catalog 2011. Vocabulary, Version 2: Rationale, Design [12] J. P. McCrae, C. Tiberius, A. F. Khan, I. Principles, and Uptake," Data Intelligence, pp. Kernerman, T. Declerck, S. Krek, M. 1-37, December 2023, doi: Monachini, and S. Ahmadi, "The ELEXIS 10.1162/dint_a_00241 . interface for interoperable lexical resources," [22] K. Alexander, R. Cyganiak, M. Hausenblas, and in Proceedings of the sixth biennial conference J. Zhao. (2011, March) World Wide Web on electronic lexicography (eLex). eLex 2019, Consortium (W3C). [Online]. Available: 2019. http://www.w3.org/TR/void/ [13] O. Erling, "Virtuoso, a Hybrid RDBMS/Graph [23] M. Fiorelli, A. Stellato, J. P. Mccrae, P. Cimiano, Column Store," Data Engineering Bulletin, vol. and M. T. Pazienza, "LIME: the Metadata 35, no. 1, pp. 3-8, 2012. Module for OntoLex," in The Semantic Web. [14] A. Gonzales-Aguilar, M. Ramírez-Posada, and Latest Advances and New Domains (Lecture D. Ferreyra, "TemaTres: software para Notes in Computer Science), F. Gandon et al., gestionar tesauros," El profesional de la Eds.: Springer International Publishing, 2015, información, vol. 21, no. 3, pp. 319-325, 2012, vol. 9088, pp. 321-336, doi: 10.1007/978-3- doi: 10.3145/epi.2012.may.14 . 319-18818-8_20 .