Knowledge graphs within everyone’s means
Maria Angela Pellegrino
Dipartimento di Informatica, Università degli Studi di Salerno, Fisciano (SA) ITALY


                                      Abstract
                                      A Knowledge Graph is a useful means for empowering users in actively exploring data of interest and
                                      manage acquired knowledge. However, its most common query language, SPARQL, proves to be too
                                      complex for lay users. Thus, the proposal of tools and interfaces to unlock their potentialities while
                                      masking underlying complexity in querying mechanisms is required. My Ph.D. research is situated
                                      exactly in this context and aims to scaffold end-users in taking advantage of Knowledge Graphs.

                                      Keywords
                                      Semantic Web, Knowledge management, Information retrieval, Data exploitation, Query builder


1. Introduction
The term Knowledge Graph (KG) has been recently used by the Semantic Web community to
refer to any graph-based knowledge representation [1]. KGs, through a Semantic Web lens,
allow for agile navigation of arbitrary entities thanks to defined links [1]. Thus, they are also
referred to as Linked (Open) Data (LD or LOD)). Over the past decades, hundreds of datasets
have been published using the Semantic Web standards covering any topical domain [2]. The
LOD Cloud [3] (a KG that collects most of the published KGs) counted 12 datasets in 2007 and
currently contains 1,239 datasets. Some of these KGs are proprietary, maintained internally by
companies such as Google, Microsoft, Apple; while others, like DBpedia [4] and Wikidata [5],
are openly available and maintained by dedicated communities. The central idea of LD is that
data publishers support applications in discovering and integrating data by complying with a set
of best practices in the areas of linking, vocabulary usage, and metadata provision [2]. Because
of the extensive range of heterogeneous information stored in KGs, for their easy navigation,
thanks to their quantitative and qualitative properties, they could behave as a critical resource
for information retrieval (IR) and knowledge management (KM).
   The exploitation of KG is mainly affected by i) required technical skills in query languages (e.g.,
SPARQL) and in understanding the semantics of the supported operators [6], too challenging
for lay users, and ii) conceptualization issues to understand how data are modelled [6, 7].
   These drawbacks have led to the development of tools and interfaces to support users in
interacting with KGs by implicitly composing queries while hiding the underlying complexity.
   My research is situated in this context and aims to propose approaches and prototyping
tools to express users’ needs or explore available data by a Natural Language (NL) interface to
guide end-users (with different interests, types of background, age, and needs) to query KGs

CHItaly 2021 Joint Proceedings of Interactive Experiences and Doctoral Consortium, July 11–13, 2021, Bolzano, Italy
" mapellegrino@unisa.it (M. A. Pellegrino)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                       36
Maria Angela Pellegrino CEUR Workshop Proceedings                                             36–40


and take advantage of them without requiring technical skills in query languages. Instead of
proposing a unified tool to address the heterogeneity of the target audience, I opt for proposing
a unified approach to guide KG exploitation (described in Section 2) and instantiate it in different
interfaces (detailed in Section 3) to fulfil specific requests of each target group.


2. From Data Querying to Data Exploitation Approach
The KM process includes 1) data retrieval, 2) data refinement, and 3) data exploitation (Fig. 1). It
requires the contamination of IR, Information Visualization, and Human-Computer Interaction.


Figure 1: High-level workflow representation. End-users are guided in querying KGs by a (con-
trolled) natural language interface, retrieved data are modeled as a tabular representation which can
be refined (if needed), and, then, it can be exploited in a concrete (and reusable) artifact.


    Data retrieval requires search activities that can be classified in lookup and exploratory
search [8]. Lookup is a search task where users know what they are looking for and can formulate
it as a direct question, as in question-answering (QA) applications, while an exploratory search
task is an open-ended search that usually starts with vague information needs and requires
iterative query formulation [9], facets or taxonomies [10], keyword search paradigm which
includes auto-suggestions, instant results, partial matches [11, 12] to explore data.
    As a data retrieval interface, I propose query builders enhanced with (controlled) NL inter-
faces to guide users in naturally posing questions by simulating, as much as possible, human
interactions. If users have a clear objective, they can directly type or pronounce their requests.
Vice versa, in exploratory search, users are guided in iteratively creating and refining questions.
    As a query builder, NL queries can be translated to SPARQL to be run over a SPARQL endpoint
(i.e., a way to publicly expose KG content). Among SPARQL constructs, SELECT query results
can be naturally represented as tabular data. Thus, retrieved data are modelled as tables, which
can be manually or automatically refined, and finally, used in data exploitation mechanisms.
    It may result in textual replies or concrete artifacts, perhaps customizable and exportable,
such as charts, data visualizations, data stories or virtual reality (VR) based data representations.


3. Proposed Prototypes and Explored Scenarios
To instantiate the general approach represented in Fig. 1 , I considered target audience who might
be interested in accessing KGs, and consequently, I designed, prototyped, and evaluated tools
able to satisfy users’ needs. I focus on Open Data (OD) experts and Public Administrations (PAs),
education, and the Cultural Heritage (CH) community as target groups.


                                                 37
Maria Angela Pellegrino CEUR Workshop Proceedings                                                      36–40


Open Data Experts and QueDI. During the latest years, my research lab, ISISLab, managed an
European project, ROUTE-TO-PA, to support users in publishing high-quality OD and effectively
exploiting them. It results in SPOD, a Social Platform for OD [13], a virtual place where users
can co-create and exploit OD. SPOD is used by data producers or data enthusiastic (e.g., citizens,
CH communities, PAs) with OD management skills, i.e., table manipulation and chart creation.
   The problem I aimed to solve is how to make this target group able to access LOD and KGs
without requiring explicit usage of SPARQL with the possibility to rely on their expertise in
OD management. I proposed a transitional approach where OD experts are guided from LOD
querying to their comfort zone. It resulted in QueDI (Query Data of Interest) which allows users
to build queries step-by-step with an auto-complete mechanism and to exploit retrieved results
by exportable and dynamic visualizations. QueDI scaffolds users in, first, creating a tabular
representation of the dataset of interest by ELODIE, a query builder enhanced with a controlled
NL interface. ELODIE (whose interface is visible in Fig. 2) realizes an exploratory search by
organizing available data in facets and supporting users in automatically retrieving both user
query results and data to go on with the query formulation by querying a configured SPARQL
endpoint. Second, QueDI supports a manual dataset manipulation phase where users can exploit
their skills in data refinement by aggregating, sorting, filtering, and cleaning data by interacting
with a form-based interface that behaves as a SQL builder. Finally, it enables the creation of an
exportable and reusable visualization. QueDI is freely available online1 . Besides its accuracy,
expressivity and scalability features [14], it is usable according to the SUS score [14, 15].
Education. I proposed QueDI as a KM tool in the educational context to support future citizens
in going beyond the passive inspection of results returned by a search engine, and in actively
searching for the data that best answer their questions (as described in [16]).
   Moreover, I investigated on the implicit exploitation of KGs in retrieving synonyms lookup in
Novelette [17] (freely available on GitHub2 ), a digital storytelling environment where storytellers
can create stories to graphically represent tales, data stories, or media stories. If users experience
writer’s block, Novelette has a suggestion provision mechanism. Users can type the word of
interest and Novelette will automatically retrieve synonyms by querying BabelNet [18] and
organizing retrieved results in (navigable) word clouds. It represents a keyword-based interface
to implicitly explore KGs by navigating synonyms.
Cultural Heritage Community. In the last year, virtual exhibitions have been widely adopted
to enhance physical tours. I propose to take advantage of CH KGs in an authoring platform
for VR-based virtual exhibitions by combining ELODIE and an automatic mechanism to create
VR-based solutions [19]. Assuming I am a Van Gogh’s lover, I wish to visit all the museums
that contain at least one painting by him. I can query DBpedia by ELODIE and collect all the
artworks painted by Van Gogh by also retrieving information related to their geographical
location. The corresponding query formulated in ELODIE and the resulting set of replies is
visible in Fig. 2. Once I am satisfied with the retrieved results, instead of visualizing results by a
chart as in QueDI, the generator guides me in creating VR-based virtual exhibitions3 .
    1
      QueDI on GitHub: https://github.com/routetopa/deep2-components/tree/master/controllets/splod-controllet
Demo: https://deep.routetopa.eu/deep2/COMPONENTS/controllets/splod-visualization-controllet/demo.html
    2
      Novelette links: https://github.com/routetopa/storylet, http://www.isislab.it:19984/en/home-page-2/
    3
      Use case - dataset creation: https://youtu.be/63SmstO_x78.
Virtual exhibition tour and download: https://youtu.be/9LNdFY_2OJw; https://www.isislab.it/en/virtual-museum/.


                                                     38
Maria Angela Pellegrino CEUR Workshop Proceedings                                           36–40


Figure 2: ELODIE interface in the Van Gogh’s experience use case.


   Moreover, I investigated how to make Virtual Assistants (VAs) compatible with KGs. It
resulted in a community shared software framework (a.k.a. generator) that enables lay-users
to create ready-to-use custom extensions for performing QA over KGs [20, 21]. This proposal
represents a step forward in enabling direct search and lookup over KGs.


4. Conclusions
KGs are crucial for KM and active IR, but their query languages are difficult to use, above all
by lay users. Thus, I aim to unlock the potentialities of KGs by enabling a natural human-KG
interaction. I proposed solutions to explore KGs (such as ELODIE) or to directly lookup data
of interest (such as by VA devices). Furthermore, the proposed data exploitation mechanisms
enable agile understanding and sharing of achieved knowledge to support discussions. For each
target audience, I designed an interface that replies to specific and concrete needs and relies on
users’ skills without requiring them to getting aware of data format and query languages.


References
 [1] H. Paulheim, Knowledge graph refinement: A survey of approaches and evaluation
     methods, Semantic Web 8 (2016) pp. 489–508.


                                               39
Maria Angela Pellegrino CEUR Workshop Proceedings                                           36–40


 [2] M. Schmachtenberg, C. Bizer, H. Paulheim, Adoption of the linked data best practices in
     different topical domains, in: Proc. of ISWC, Springer, 2014, pp. 245–260.
 [3] J. P. McCrae, The LOD cloud, 2007. URL: http://lod-cloud.net, access 2021/04/21.
 [4] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann,
     M. Morsey, P. Van Kleef, S. Auer, et al., Dbpedia–a large-scale, multilingual knowledge
     base extracted from wikipedia, Semantic web 6 (2015) 167–195.
 [5] D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledge base, Communications
     of the ACM 57 (2014) 78–85.
 [6] H. Vargas, C. B. Aranda, A. Hogan, C. López, RDF explorer: A visual SPARQL query
     builder, in: Proc. of ISWC, volume 11778, Springer, 2019, pp. 647–663.
 [7] P. Bellini, P. Nesi, A. Venturi, Linked open graph: Browsing multiple sparql entry points
     to build your own lod views, J. of Visual Languages & Computing 25 (2014) pp. 703 – 716.
 [8] G. Marchionini, Exploratory search: from finding to understanding, Communications of
     the ACM 49 (2006) 41–46.
 [9] M. Hearst, Search user interfaces, Cambridge university press, 2009.
[10] D. Tunkelang, Dynamic category sets: An approach for faceted search, in: ACM SIGIR,
     volume 6, Citeseer, 2006.
[11] P. Morville, J. Callender, Search patterns: design for discovery, O’Reilly Media, Inc., 2010.
[12] T. Russell-Rose, T. Tate, Designing the search experience: The information architecture of
     discovery, Newnes, 2012.
[13] G. Cordasco, R. De Donato, D. Malandrino, G. Palmieri, A. Petta, D. Pirozzi, G. Santangelo,
     V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini, Engaging citizens with a social platform
     for open data, in: Proc. of Digital Government Research, ACM, 2017, p. 242–249.
[14] R. De Donato, M. Garofalo, D. Malandrino, M. A. Pellegrino, A. Petta, V. Scarano, QueDI:
     From knowledge graph querying to data visualization, Semantic Systems (2020) 70.
[15] R. D. Donato, M. Garofalo, D. Malandrino, M. A. Pellegrino, A. Petta, V. Scarano, Linked
     data queries by a trialogical learning approach, in: Proc. of Int. Conf. on Computer
     Supported Cooperative Work in Design, CSCWD, IEEE, 2019, pp. 117–122.
[16] R. De Donato, M. Garofalo, D. Malandrino, M. A. Pellegrino, A. Petta, Education meets
     knowledge graphs for the knowledge management, in: Proc. of Methodologies and
     Intelligent Systems for Technology Enhanced Learning, Springer, 2020, pp. 272–280.
[17] A. Addone, R. De Donato, G. Palmieri, M. A. Pellegrino, A. Petta, V. Scarano, L. Serra,
     Visual storytelling by novelette, in: Proc. of IV, IEEE, 2020, pp. 723–728.
[18] R. Navigli, S. P. Ponzetto, Babelnet: The automatic construction, evaluation and application
     of a wide-coverage multilingual semantic network, Artificial Intelligence 193 (2012) 217–
     250.
[19] D. Monaco, M. A. Pellegrino, V. Scarano, L. Vicidomini, The role of linked open data in
     authoring virtual exhibitions, 2021. Submitted to Journal of Cultural Heritage in Dec. 2020.
[20] M. A. Pellegrino, M. Santoro, V. Scarano, C. Spagnuolo, Automatic skill generation for
     knowledge graph question answering, in: Proc. of ESWC, 2021.
[21] M. A. Pellegrino, V. Scarano, C. Spagnuolo, Move cultural heritage knowledge graphs in
     everyone’s pocket, 2021. Submitted to Semantic Web Journal in Mar. 2021.


                                               40