=Paper=
{{Paper
|id=Vol-3762/577
|storemode=property
|title=Towards ShowVoc: dataset publication and browsing
|pdfUrl=https://ceur-ws.org/Vol-3762/577.pdf
|volume=Vol-3762
|authors=Armando Stellato,Manuel Fiorelli,Tiziano Lorenzetti,Andrea Turbati
|dblpUrl=https://dblp.org/rec/conf/ital-ia/StellatoFLT24
}}
==Towards ShowVoc: dataset publication and browsing==
Towards ShowVoc: dataset publication and browsing
Armando Stellato1,2,* ,†, Manuel Fiorelli1,2,†, Tiziano Lorenzetti2,†and Andrea
Turbati2,†
1Tor Vergata University of Rome, Italy
2Lore Star s.r.l., Rome, Italy
Abstract
ShowVoc is a web-based, multilingual, platform for publication and consumption of datasets
complying with Semantic Web standards. Born in the context of the ISA2 European programme
for the development of digital solutions for interoperable cross-border and cross-sector public
services, ShowVoc aims at providing a one-stop shop for maximizing the diffusion of semantic
and lexical resources as Linked Open Data. To this end, ShowVoc combines traditional data
provisioning following LOD policies with global activities (e.g. global search, navigation of dataset
relationships/alignments, translation API benefiting from multilingual datasets and linksets). A
rich dataset browsing interface provides dedicated support for diverse data models: OWL
ontologies, SKOS/SKOS-XL thesauri, OntoLex-Lemon lexicons and generic RDF datasets and
linkage possibilities (EDOAL, XKOS). A metadata registry completes the offer combining different
metadata vocabularies into an advanced catalog that can be inspected through a convenient user
interface and LOD best practices. Finally, ShowVoc is an ideal companion to VocBench, a popular
collaborative editing environment for Semantic Web resources, complementing it for realizing
an entire workflow embracing all stages of a dataset life, from realization and maintenance, to
release and publication.
Keywords
Semantic Web, Linked Open Data, Dataset Catalogs, Metadata repositories, Data consumption 1
1. Introduction seriously, then we should consider people's reliance
on search engines as an entry point to the Web.
The Semantic Web [1], which is being built according Although semantic web search engines are not as
to Linked Data [2] best practices, is based on the common as they could be, there has been a
decentralized publication of disparate but interlinked proliferation of dataset catalogs, both in specific
datasets that together form a huge global graph. domains and across the web, which play a similar role.
Although resolvable URIs and query-through- In this paper, we present ShowVoc, a platform for
discovery are the defining access mechanism for a dataset publication and exploitation, which addresses
machine-accessible Web and the focus is on linking both needs: it allows for the publication of datasets
records, there is still a need, especially for humans, for with resolvable URIs and a more sophisticated
a coarse-grained perspective made of browsing, browsing experience than simple subject pages while
querying and visualization capabilities over the offering a fully-fledged data portal for linked datasets.
published resources. ShowVoc can be seen as a companion to VocBench 3
Discovery by link traversal - a la “follow your [3], a platform for dataset development and
nose” - is closely related to people surfing the Web in maintenance, inheriting many of its features, such as
search of information. If we take this analogy its advanced multi-model support. However, while
Ital-IA 2024: 4th National Conference on Artificial Intelligence, 0000-0001-5374-2807 (A. Stellato); 0000-0001-7079-8941
organized by CINI, May 29-30, 2024, Naples, Italy (M. Fiorelli); 0000-0001-5676-8877 (T. Lorenzetti); 0000-0002-
∗ Corresponding author. 6214-4099 (A. Turbati)
† These authors contributed equally. © 2024 Copyright for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
stellato@uniroma2.it (A. Stellato); manuel.fiorelli@uniroma2.it
(M. Fiorelli); tiziano.lorenzetti@lorestar.it (T. Lorenzetti);
andrea.turbati@lorestar.it (A. Turbati);
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
most of the operations in VocBench 3 deal with Alignment management, which is addressed by
individual datasets, ShowVoc adds a number of cross- many Semantic Web catalogs, including LOV and
dataset operations that rely on managing multiple OntoPortal, can also be a use case in its own right. For
datasets. These include global search, translation and example, the Alignment API [11] ships with a server
alignment management, which are based on the idea that can handle an ontology network, with the ability
that multiple datasets contribute to a sort of giant to compute, retrieve, combine, and otherwise
virtual reference for terminology and translation. manipulate alignments between ontologies. In a
ShowVoc is open source and made available under related vein, the ELEXIS [12] project aims at linking
the BSD-3-Clause license. The project official web site (legacy) language resources via linked data, and has
is https://showvoc.uniroma2.it/. Source code and developed a standard REST API for accessing a catalog
deployment artifacts are hosted on Bitbucket at of dictionaries. Both of these applications are covered
https://bitbucket.org/art-uniroma2/showvoc. by ShowVoc, as we will see later.
The paper is structured as follows. Section 2 We conclude the section on related work by
discusses related work. Section 3 briefly describes the discussing the publication of linked data. Pubby7
architecture of the ShowVoc. Section 4 delves into the implements resolvable URIs by querying a SPARQL
main features of ShowVoc. Then, Section 5 argues for endpoint. This software is now discontinued, but
its impact. Finally. Section 6 draws conclusions. newer alternatives such as LodView8 and Loddy9
have emerged. The triple store Virtuoso [13] has even
2. Related work integrated this feature without the need for third-
party software. Subject pages were even took as a
We should probably start our discussion of related paradigm for data editing, in systems such as
efforts on data portals with CKAN2, which had TemaTres [14] or OntoWiki [15].
established itself as a de facto standard, particularly in Subject pages are not always the best choice for
the public sector, with its rich API and support for browsing through your data. For example, SKOSMOS10
federation of catalogs. Within the scientific became a popular choice for publishing a collection of
community, Zenodo3 (based on the open-source SKOS thesauri with more sophisticated browsing
software Invenio4) has established itself as the go-to capabilities, including search and indexing. For
solution for ensuring data persistence, similar to what ontologies, the need for more organized
arXiv5 has achieved for preprint publication. documentation became apparent. This can be
Regarding the impact of archiving, the Open Archive automatically generated from the ontology definitions
Initiative [4] (OAI) is certainly of interest, especially themselves using tools such as LODE [16] or, more
for its metadata harvesting protocol (OAI-PMH). recently, WIDOCO [17]. This feature has also been
None of these solutions are specifically tailored to developed within VocBench 3 using its custom
semantic web datasets, beyond the ability to store reporting facility. In fact, both browsing tools and
dumps as files. For this reason, we now consider documentation pages can be used to resolve URIs and
catalogs of semantic web datasets. LOV [5] is a catalog both use cases are supported by ShowVoc.
of Linked Data Vocabularies, while LOD Cloud6 hosts,
in addition to the eponymous figure, a catalog of the
datasets that actually drove the creation of the former.
3. Architecture
There are also domain-specific catalogs, most notably ShowVoc has been designed as a single-page
BioPortal [6] for ontologies related to the biomedical application (SPA), with a frontend running inside a
domain. Today, the OntoPortal Alliance [7] has taken web browser that communicates with a back-end
over BioPortal's original source code, which is being server through a REST-like API.
adopted by portals across various domains, such as The frontend is developed in TypeScript using the
agrifood [8] and biodiversity and ecology [9]. Within Angular framework and can be delivered to users by
the field of solid Earth science, we mention a any web server or CDN (Content Delivery Network).
European initiative [10] using metadata and semantic The backend server is based on Semantic Turkey
technologies for integration and access of data from [18], the same RDF services platform that powers
diverse sources. VocBench 3. The platform, based on an opinionated
2 https://ckan.org/ 7 https://github.com/cygri/pubby
3 https://zenodo.org/ 8 https://github.com/LodLive/LodView
4 https://inveniosoftware.org/ 9 https://bitbucket.org/art-uniroma2/loddy
5 https://arxiv.org/ 10 https://skosmos.org/
6 https://lod-cloud.net/
combination of the Spring Boot11 and PF4J12 that entity identifiers be resolvable via HTTP, which is
frameworks, supports the development and perhaps the defining characteristic of the Linked Data
publication of services related to RDF data. Prebuilt paradigm. ShowVoc supports this as well, with some
services address multiple models and various endpoints that can be queried by a reverse proxy
concerns such as history, validation, and associated with the domain to implement content
import/export. PF4J makes it easy to deploy new negotiation and generate different variants, including
services and extend the capabilities of existing ones by machine-readable serializations and a human-
providing implementations of the extension points on friendly page.
which they depend. For example, the export service Multi-model support. ShowVoc inherits from
defines extension points that can be used to provide Semantic Turkey the ability to manage arbitrary RDF
both the conversion logic to a particular serialization datasets, coupled with convenient facilities for OWL
format (i.e., reformatting exporter) and the ability to ontologies and other less formal Knowledge
deploy data to particular targets (i.e., deployer). Organization Systems (KOS) modeled in SKOS, as well
Semantic Turkey ships with implementations of these as OntoLex/Lemon lexicons. In addition, ShowVoc is
extension points for common use cases, but (as aware of various lexicalization models for grounding
mentioned) new ones can be added to the system. data in natural language, including RDFS, SKOS(-XL),
Semantic Turkey relies on the RDF4J framework and OntoLex-Lemon.
to process RDF data and interact with triple stores At the user interface level, this flexibility is first
(i.e., RDF database management systems), both in- visible in ShowVoc's resource view, which can display
process within Semantic Turkey or managed as the description of any resource, divided into sections
separate processes. The latter option is the preferred that roughly correspond to different properties. As
method, allowing the use of enterprise-grade triple such, the resource view can display any type of
stores such as Ontotext ‘s GraphDB13. resource, but it can be specialized and made efficient
VocBench 3 and ShowVoc can share the same for specific modeling vocabularies through a
backend server, with a common set of projects that combination of customized templates (defining the
can be conveniently made accessible through prominent sections for different resource types),
ShowVoc. However, the common, recommended specialized sections (e.g., the one grouping class
scenario is to have separate backend servers (and, axioms), as well as dedicated support for specific
most important, different storage solutions with mechanisms (e.g., proper rendering of class axioms in
different expected workloads) for ShowVoc and Manchester syntax). ShowVoc works seamlessly with
VocBench 3 so that managers of projects developed different lexicalization models, which are taken into
within VocBench can submit datasets to the ShowVoc account when selecting the "labels" for displaying a
instance for publication [19]. resource (instead of its IRI or qname) or when
populating the "lexicalizations" section of the
4. Features resource view (which abstracts over the specific
lexicalization model).
We will introduce here the main features of ShowVoc
ShowVoc also provides a number of views to
and discuss their relevance to the system's use cases
browse the content of the dataset, depending on its
(see Figure 1 for an overview of its UI).
nature, such as a class tree, instance list, property tree,
Contributions. A ShowVoc dataset portal can concept tree, etc.
optionally allow contributions from visitors. These
Seamless navigation between local and remote
can request the addition of a new dataset, possibly
datasets. A user who encounters a reference to a
after conversion from a non-RDF format, and the
resource outside the dataset being browsed can easily
creation of a development environment within an
jump to it. If the resource belongs to another dataset
associated VocBench instance.
in the same ShowVoc installation, the user interface
Content negotiation. ShowVoc's use cases extend automatically switches to that dataset and focuses on
beyond cataloging third-party datasets, as it also the target resource. External resources can also be
addresses the needs of original dataset publishers. displayed in a modal dialog populated with
These can set up ShowVoc as an advanced browser for information retrieved by deferecentiation or a
their datasets; however, Linked Data rules require SPARQL endpoint, if known to the system.
11 https://spring.io/projects/spring-boot 13 https://www.ontotext.com/products/graphdb/
12 https://pf4j.org/
Figure 1: ShowVoc data view displaying the ELI (European Legislation Identifier) ontology.
SPARQL querying. ShowVoc provides a SPARQL ShowVoc manages the dataset descriptions using
editor with syntax highlighting and completion that the Metadata Registry (MDR) component of Semantic
can be used to query individual datasets. It also Turkey. This in turn manages the available datasets as
supports federated queries involving other hosted a DCAT [21] catalog, using a metadata profile based
datasets or remote SPARQL endpoints. The former on a combination and interpretation of existing
can be done more efficiently if the chosen triple store metadata vocabularies (e.g. DCTERMS, FOAF, VoID
is GraphDB, which has specific optimizations for local [22], LIME [23]) together with a small ontology
federation. addressing concerns (mostly related to access) not
Results can be downloaded in a variety of formats, covered by the former.
and queries can be loaded from different scopes, e.g., Distributions. ShowVoc maintains multiple
system scope for general purpose queries or dataset- distributions of each dataset. In the first place, these
specific queries. distributions can be used to provide a downloadable
Dataset metadata. Proper metadata is considered data dump of a (given version) of the dataset.
critical for publishing a dataset according to the FAIR However, they also include any additional files,
principles [20]. As such, ShowVoc manages a complete including documentation.
description for each dataset, including general Global Search. ShowVoc allows users to perform a
metadata (e.g., title), customizable facets (e.g., full-text search across all hosted datasets. Matched
category, organization), access metadata (e.g., entities, grouped by dataset, are displayed with their
SPARQL endpoint), structural metadata (e.g., URI labels, IRIs, and skos:note specializations (which
space), and various metrics that provide insight into include definitions, examples, etc.).
the richness of the available content both at the
conceptual level (e.g., number of classes, concepts, Translation. Similar to global search, this feature
etc.) and at the lexical level (i.e., regarding the degree allows users to search lookup a term in a given natural
of coverage of different natural languages). These language inside the datasets in the catalog, searching
metrics can be visualized as a table or as a chart of for a translation in one or more natural languages.
various types.
Alignments. ShowVoc keeps track of the alignments modeling vocabularies (e.g. XKOS for statistical
contained in each dataset, providing both a per- classifications), improving the publication workflow
dataset and a global view of these alignments. from VocBench to ShowVoc and further exploiting its
The former consists of an expanding tree whose linking metadata to broaden its possibilities as a
roots are the datasets directly aligned with the authority-based translation system.
current datasets. These can be expanded to show the
datasets to which they are aligned, and thus to which Acknowledgements
the current dataset is indirectly aligned. Each node is
decorated by the number of links: when one of them ShowVoc has been originally designed by Tor Vergata
is clicked, ShowVoc displays a paginated list of the University of Rome and is now maintained and
correspondences (with some filtering features). evolved by Lore Star srl in the context of the Digital
Global visualization of alignments is supported by Europe Programme, under management of the
a similar tree view as well as by graph visualization: Publications Office of the EU in a provision contract
nodes represent datasets and edges represent with European Dynamics.
alignments. By clicking on a node or an edge, users can
view metadata about a dataset or an alignment. References
[1] T. Berners-Lee, J. A. Hendler, and O. Lassila,
5. Impact "The Semantic Web: A new form of Web
First released in September 2021, ShowVoc is content that is meaningful to computers will
unleash a revolution of new possibilities,"
younger than its editing companion VocBench 3,
Scientific American, vol. 284, no. 5, pp. 34-43,
which has become a reference platform since its
2001, doi: 10.1038/scientificamerican0501-
launch in September 2017. Despite ShowVoc’s 34 .
relatively short history, we can point to some notable
[2] T. Berners-Lee. (2006) Design Issues.
adopters. The Food and Agriculture Organization [Online]. Available:
(FAO) of the United Nations (UN) adopted ShowVoc https://www.w3.org/DesignIssues/LinkedD
for the Caliper portal14, which publishes statistical ata.html
classifications as linked data. The Italian branch of [3] A. Stellato, M. Fiorelli, A. Turbati, T.
LifeWatch ERIC – the European Research Lorenzetti, W. van Gemert, D. Dechandon, C.
Infrastructure Consortium for biodiversity and Laaboudi-Spoiden, A. Gerencsér, A. Waniart,
ecology – used ShowVoc in addition to OntoPortal as E. Costetchi, and J. Keizer, "VocBench 3: A
a data publication platform supporting resolvable collaborative Semantic Web editor for
URIs. Last but not least, the Publications Office (OP) of ontologies, thesauri and lexicons," Semantic
the European Union (EU), which managed the Web, vol. 11, no. 5, pp. 855-881, Jan 2020, doi:
development of the system, has deployed an instance 10.3233/SW-200370 .
of ShowVoc15 “to support interested teams and [4] C. Lagoze and H. Van de Sompel, "The open
professionals working for the EU institutions and archives initiative: building a low-barrier
agencies”. The Publications Office also integrated interoperability framework," in Proceedings
ShowVoc into the EU Vocabularies Portal16 to provide of the 1st ACM/IEEE-CS Joint Conference on
Digital Libraries, Roanoke Virginia USA, June
an "advanced view" of the datasets content,
24-28, 2001, 2001, pp. 54-62, doi:
complementing the portal's own capabilities. 10.1145/379437.379449 .
[5] P.-Y. Vandenbussche, G. A. Atemezing, M.
6. Discussion and conclusion Poveda-Villalón, and B. Vatant, "Linked Open
In this work, we have introduced ShowVoc, a web- Vocabularies (LOV): A gateway to reusable
semantic vocabularies on the Web," Semantic
based multilingual platform for publishing and
Web, vol. 8, no. 3, pp. 437-452, December
consulting OWL ontologies, SKOS(-XL) thesauri,
2016, doi: 10.3233/SW-160213 .
Ontolex-lemon lexicons and generic RDF datasets. Its
[6] M. Salvadores, P. R. Alexander, M. A. Musen,
features and impact on the world of linked open
and N. F. Noy, "BioPortal as a dataset of linked
datasets have been discussed. Future work includes
biomedical ontologies and terminologies in
further broadening the dedicated support for core
14 https://www.fao.org/statistics/caliper/en 16 https://op.europa.eu/en/web/eu-vocabularies
15 https://showvoc.op.europa.eu
RDF," Semantic Web, vol. 4, no. 3, pp. 277-284, [15] P. Frischmuth, M. Martin, S. Tramp, T.
2013, doi: 10.3233/SW-2012-0086 . Riechert, and S. Auer, "OntoWiki – An
[7] C. Jonquet, J. Graybeal, S. Bouazzouni, M. Dorf, authoring, publication and visualization
N. Fiore, X. Kechagioglou, T. Redmond, I. interface for the Data Web," Semantic Web,
Rosati, A. Skrenchuk, J. L. Vendetti, M. Musen, vol. 6, no. 3, pp. 215-240, 2015, doi:
and m. o.t.O. Alliance, "Ontology Repositories 10.3233/SW-140145 .
and Semantic Artefact Catalogues with the [16] S. Peroni, D. Shotton, and F. Vitali, "Making
OntoPortal Technology," in The Semantic Web Ontology Documentation with LODE," in
– ISWC 2023 (Lecture Notes in Computer Proceedings of the I-SEMANTICS 2012 Posters
Science), T. R. Payne et al., Eds.: Springer, & Demonstrations Track, Graz, Austria,
Cham, 2023, vol. 14266, pp. 38-58, doi: September 5-7, 2012, 2012, pp. 63-67.
10.1007/978-3-031-47243-5_3 . [Online]. Available: https://ceur-ws.org/Vol-
[8] C. Jonquet, A. Toulet, E. Arnaud, S. Aubin, E. 932/paper12.pdf
Dzalé Yeumo, V. Emonet, J. Graybeal, M.-A. [17] D. Garijo, "WIDOCO: A Wizard for
Laporte, M. A. Musen, V. Pesce, and P. Documenting Ontologies," in The Semantic
Larmande, "AgroPortal: A vocabulary and Web – ISWC 2017. ISWC 2017 (Lecture Notes
ontology repository for agronomy," in Computer Science), vol. 10588, 2017, pp.
Computers and Electronics in Agriculture, vol. 94-102, doi: https://doi.org/10.1007/978-3-
144, pp. 126-143, 2018, doi: 319-68204-4_9 .
10.1016/j.compag.2017.10.012 . [18] M. T. Pazienza, N. Scarpato, A. Stellato, and A.
[9] X. Kechagioglou, L. Vaira, P. Tomassino, N. Turbati, "Semantic Turkey: A Browser-
Fiore, A. Basset, and I. Rosati, "EcoPortal: An Integrated Environment for Knowledge
Environment for FAIR Semantic Resources in Acquisition and Management," Semantic Web
the Ecological Domain," in Proceedings of the Journal, vol. 3, no. 3, pp. 279-292, 2012, doi:
Joint Ontology Workshops 2021. Episode VII: 10.3233/SW-2011-0033 .
The Bolzano Summer of Knowledge. co-located [19] M. Fiorelli, A. Stellato, I. Rosati, and N. Fiore,
with the 12th International Conference on "Process-Level Integration for Linked Open
Formal Ontology in Information Systems, and Data Development Workflows: A Case Study,"
the 12th International Conference on in Metadata and Semantic Research
Biomedical Ontologies, vol. 2969, 2021. (Communications in Computer and
[Online]. Available: https://ceur-ws.org/Vol- Information Science), E. Garoufallou and A.
2969/paper6-s4biodiv.pdf Vlachidis, Eds.: Springer, Cham, 2023, vol.
[10] D. Bailo, R. Paciello, J. Michalek, M. Cocco, C. 1789, pp. 148-159, doi: 10.1007/978-3-031-
Freda, K. Jeffery, and K. Atakan, "The EPOS 39141-5_13 .
multi-disciplinary Data Portal for integrated [20] M. D. Wilkinson, et al., "The FAIR Guiding
access to solid Earth science datasets," Principles for scientific data management and
Scientific Data, vol. 10, 2023, doi: stewardship," Scientific Data, vol. 3, no.
10.1038/s41597-023-02697-9 . 160018, 2016, doi: 10.1038/sdata.2016.18 .
[11] J. David, J. Euzenat, F. Scharffe, and C. Trojahn [21] R. Albertoni, D. Browning, S. Cox, A. N.
dos Santos, "The Alignment API 4.0," Gonzalez-Beltran, A. Perego, and P.
Semantic Web Journal, vol. 2, no. 1, pp. 3-10, Winstanley, "The W3C Data Catalog
2011. Vocabulary, Version 2: Rationale, Design
[12] J. P. McCrae, C. Tiberius, A. F. Khan, I. Principles, and Uptake," Data Intelligence, pp.
Kernerman, T. Declerck, S. Krek, M. 1-37, December 2023, doi:
Monachini, and S. Ahmadi, "The ELEXIS 10.1162/dint_a_00241 .
interface for interoperable lexical resources," [22] K. Alexander, R. Cyganiak, M. Hausenblas, and
in Proceedings of the sixth biennial conference J. Zhao. (2011, March) World Wide Web
on electronic lexicography (eLex). eLex 2019, Consortium (W3C). [Online]. Available:
2019. http://www.w3.org/TR/void/
[13] O. Erling, "Virtuoso, a Hybrid RDBMS/Graph [23] M. Fiorelli, A. Stellato, J. P. Mccrae, P. Cimiano,
Column Store," Data Engineering Bulletin, vol. and M. T. Pazienza, "LIME: the Metadata
35, no. 1, pp. 3-8, 2012. Module for OntoLex," in The Semantic Web.
[14] A. Gonzales-Aguilar, M. Ramírez-Posada, and Latest Advances and New Domains (Lecture
D. Ferreyra, "TemaTres: software para Notes in Computer Science), F. Gandon et al.,
gestionar tesauros," El profesional de la Eds.: Springer International Publishing, 2015,
información, vol. 21, no. 3, pp. 319-325, 2012, vol. 9088, pp. 321-336, doi: 10.1007/978-3-
doi: 10.3145/epi.2012.may.14 . 319-18818-8_20 .