=Paper=
{{Paper
|id=Vol-2941/paper3
|storemode=property
|title=Do you need a knowledge graph? Helping organizations determine whether a knowledge graph is needed for their problems with the Smals KG Checklist
|pdfUrl=https://ceur-ws.org/Vol-2941/paper3.pdf
|volume=Vol-2941
|authors=Christophe Debruyne,Katy Fokou,Paul Stijfhals
|dblpUrl=https://dblp.org/rec/conf/i-semantics/DebruyneFS21
}}
==Do you need a knowledge graph? Helping organizations determine whether a knowledge graph is needed for their problems with the Smals KG Checklist==
<pdf width="1500px">https://ceur-ws.org/Vol-2941/paper3.pdf</pdf>
<pre>
 Do you need a knowledge graph? Helping organizations
determine whether a knowledge graph is needed for their
         problems with the Smals KG Checklist

                  *
                   Christophe Debruyne, Katy Fokou, and Paul Stijfhals

              Smals Research, Smals, Avenue Fonsny 20, 1060 Brussels, Belgium
                                first.last@smals.be


        Abstract. A knowledge graph (KG) is a graph that needs to fulfill specific criteria
        and is key in unlocking data siloes and tacit information for innovative applica-
        tions. However, organizations may find it hard to identify use cases for KGs and
        even assessing whether a KGs is a viable approach for a particular problem.
        Within Smals Research, we designed the Smals KG Checklist, which guides a
        group of stakeholders in determining whether KG technologies can be used to
        address a concrete problem. The checklist is to be used in a workshop environ-
        ment where a facilitator guides the stakeholders in filling the checklist.


        Keywords: knowledge graph engineering, project planning, project manage-
        ment


1       Introduction
Smals is a non-profit organization that realizes innovative ICT projects for mostly Bel-
gian (semi-)government departments and other affiliated members. Within Smals, the
Smals Research† department investigates opportunities for using, deploying, and devel-
oping skills and know-how for emerging and innovative technologies for its members.
One of these technologies, amongst many others, is knowledge graphs (KGs) [1].
   It has become apparent that business analysts‡ and members (from now on called
stakeholders) have difficulty understanding the concept of KGs. Stakeholders do have
a good understanding of the problems they face, but do not know how these problems
can be solved using KGs. Anecdotally, for instance, stakeholders have shown interest
in recommender systems to propose related information, whereas it turned out to be
“simply” information about the same entity from two different silos—i.e., a data inte-
gration problem. Stakeholders are familiar with graph databases as they have been
adopted to solve specific problems such as graph analytics. Still, these graphs cannot

Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
†
  https://www.smalsresearch.be/
‡
  Smals employees that are working closely with its members to comprehend their business needs,
    translating those into ICT projects, and follow up and manage the realization of these projects.
2


(yet) be regarded as KGs, which adds to the confusion.
   To help stakeholders understand the concept of a KG and, more importantly, to assist
them in assessing whether a KG is a viable approach in tackling concrete problems, we
have designed the Smals KG Checklist§. The checklist is to be used in a collaborative
setting such as a workshop in which stakeholders provide input. The checklist acts as a
guide for the workshop facilitator, and it is up to the facilitator to capture and refine the
stakeholder’s input.

2      Graphs vs. Knowledge Graphs
In [1], the authors analyzed various definitions of the term KG. While there is no con-
sensus, [1] allows us to summarize that a KG is a graph (representing entities and their
relationships) that fulfills three criteria: C1) the KG has a non-trivial schema or ontol-
ogy; C2) the KG integrates information from heterogeneous sources; and C3) the KG
is used to gain insights by inferring implicit information from explicit information (ei-
ther via the schema, machine learning, or tooling on top of the KG).
   KGs are typically stored in graph databases, but graph databases are also used in
other scenarios. Graph databases are necessary for graph analytics and may solve issues
stemming from a relational database’s computational limitations. An example of the
latter is Neo4j’s use case on access and identity management. Neo4j reported on the
migration of a relational database to a graph database to avoid expensive recursive
joins.** As the data was merely migrated and not all criteria were met, we argue that
this project did not yield a KG.
   How can we determine whether a project needs a graph or a KG? To determine
whether a particular problem can be solved with a KG, we need to determine whether
a solution requires the three criteria to be met.

3      Related Work
KG technology vendors often publish whitepapers and blog posts on the successful ap-
plication of KG. Even organizations report on their use cases and lessons learned (e.g.,
[2]). While valuable, not all organizations are faced with the same problems and organ-
izations do not necessarily have the expertise to extrapolate those examples to similar
cases. There are also resources aimed at both academia and industry (e.g., [3]). While
they also provide valuable information on everything involved in a KG project (activi-
ties, methodologies, techniques, etc.) and examples, they do not provide a tool such as
our checklist to determine the applicability of KG technologies for a problem. The
Smals KG Checklist, which we present in this paper, thus addressed this critical gap.
   We designed a tool that can be used in a workshop setting, much like a business
model canvas or an ethics canvas [4]. As we start from concrete problems and the ques-
tions we needed to be answered are well-scoped, we did not pattern our checklist after
these canvases.

§
   Made available with a CC BY-NC-SA 4.0 via https://www.smalsresearch.be/wp-content/up-
    loads/2021/06/smals-kg-checklist.pdf
**
   https://neo4j.com/blog/enterprise-identity-access-management/, last accessed May 25th, 2021
                                                                                           3


4      The Smals KG Checklist
The checklist consists of two parts (see Fig. 1 and Fig. 2). In Part I, we first identify the
problem, stakeholders, and core concepts. Then we aim to answer three questions by
filling in Part II (Fig. 2). These three questions are related to the criteria of a KG men-
tioned in the previous section. The final question on Part I is used to identify future
opportunities for the KG, or its applicability in the longer term. Once filled in, and
refined over time, the checklist can be used to determine whether KG technologies are
needed to solve a particular problem. The questions on KG criteria are given a color,
and these colors reappear in the sections of the second part: purple corresponds with
C1, green with C2, and orange with C3. These sections are key in determining whether
all criteria are (or have to be) met:
•   Section I is used to list the sources that will inform our schema. Section I has two
    colors as the integration of an existing database could appear in both the bottom-
    up integration of structured data (Section II in Fig. 2) and as input for the KG’s
    schema by lifting its database schema.
•   Sections II and IV are concerned with integrating structured and unstructured data,
    respectively. The integration of provenance information, metadata, annota-
    tions,…of data mentioned in Sections II and IV is captured in Section III, and
    therefore placed between Sections II and IV. We have noticed that it helps to ask
    this question explicitly.
•   Sections V and VI are respectively concerned with symbolic reasoning (e.g., OWL
    reasoning) and machine learning. As reasoning over the KG requires an ontology,
    there is an arrow from Section II to Section I. Reasoning and AI techniques are
    clear indications of gathering insights from the KG and are therefore orange. We
    can argue that retrieving information according to Linked Data principles does not
    necessarily help one gain insights, but visualization tools (e.g., [5, 6]) and faceted
    browsing (e.g., [7, 8]), amongst others, may. This is why Section VII, concerned
    with applications on top of the KG, has a gradient fill instead of a complete fill.
    We deem the criteria w.r.t. to Section VII fulfilled when the need for such tools to
    gather insights is mentioned.
    We use one of Smals Research’s KG projects with the Belgian Social Security to
illustrate the checklist by filling in the forms. This project aims to integrate the data of
three databases to which inspectors have access but are unable to query as a whole.
Integrating the data into a KG requires an ontology and would greatly facilitate their
work by allowing them to gain insights in, for instance, one’s employment histories.

5      Summary
It is difficult for organizations to identify when KGs technologies are a viable means to
an end. There are many valuable resources on KGs available, but they either focus on
constructing KGs or report on KG use cases that organizations need to extrapolate. To
help organizations determine whether a KG will help tackle a concrete problem, we
4


designed the Smals KG Checklist. The tool is meant to be filled by a group of stake-
holders, with a facilitator taking the lead and guiding the discussion.
Disclaimer. The views and opinions expressed in this paper are those of the authors and
do not express the views or opinions of Smals. Acknowledgements. We thank Gunther
Hellebaut, Karel Van Eeckhoutte, and Catherine Vanden Daelen for their feedback.

References
1.   Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutiérrez, C., Gayo,
     J.E.L., Kirrane, S., Neumaier, S., Polleres, A., Navigli, R., Ngomo, A.-C.N., Rashid, S.M.,
     Rula, A., Schmelzeisen, L., Sequeda, J.F., Staab, S., Zimmermann, A.: Knowledge Graphs.
     CoRR. abs/2003.0, (2020).
2.   Hubauer, T., Lamparter, S., Haase, P., Herzig, D.M.: Use Cases of the Industrial
     Knowledge Graph at Siemens. In: van Erp, M., Atre, M., López, V., Srinivas, K., and
     Fortuna, C. (eds.) Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and
     Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference
     (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018. CEUR-WS.org (2018).
3.   Pan, J.Z., Vetere, G., Gómez-Pérez, J.M., Wu, H. eds: Exploiting Linked Data and
     Knowledge Graphs in Large Organisations. Springer (2017). https://doi.org/10.1007/978-
     3-319-45654-6.
4.   Reijers, W., Koidl, K., Lewis, D., Pandit, H.J., Gordijn, B.: Discussing Ethical Impacts in
     Research and Innovation: The Ethics Canvas. In: Kreps, D., Ess, C., Leenen, L., and
     Kimppa, K. (eds.) This Changes Everything - ICT and Climate Change: What Can We Do?
     - 13th IFIP TC 9 International Conference on Human Choice and Computers, HCC13 2018,
     Held at the 24th IFIP World Computer Congress, WCC 2018, Poznan, Poland, September
     19-21, 2018, Proceedi. pp. 299–313. Springer (2018). https://doi.org/10.1007/978-3-319-
     99605-9_23.
5.   Mouromtsev, D., Pavlov, D., Emelyanov, Y., Morozov, A., Razdyakonov, D., Galkin, M.:
     The Simple Web-based Tool for Visualization and Sharing of Semantic Data and
     Ontologies. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track co-located
     with the 14th International Semantic Web Conference (ISWC-2015), Bethlehem, PA,
     USA, October 11, 2015. CEUR-WS.org (2015).
6.   Debruyne, C., O’Sullivan, D.: Visually Exploring SPARQL Endpoints with Murmuration.
     In: Garijo, D. and Lawrynowicz, A. (eds.) Proceedings of the EKAW 2020 Posters and
     Demonstrations Session co-located with 22nd International Conference on Knowledge
     Engineering and Knowledge Management (EKAW 2020), Globally online & Bozen-
     Bolzano, Italy, September 17, 2020. pp. 17–21. CEUR-WS.org (2020).
7.   Kharlamov, E., Giacomelli, L., Sherkhonov, E., Grau, B.C., Kostylev, E. V., Horrocks, I.:
     SemFacet. In: Proceedings of the 2017 ACM on Conference on Information and
     Knowledge Management - CIKM ’17. pp. 2475–2478. ACM Press, New York, New York,
     USA (2017). https://doi.org/10.1145/3132847.3133192.
8.   Koho, M., Heino, E., Hyvönen, E.: SPARQL Faceter - Client-side Faceted Search Based
     on SPARQL. In: Troncy, R., Verborgh, R., Nixon, L.J.B., Kurz, T., Schlegel, K., and
     Sande, M. Vander (eds.) Joint Proceedings of the 4th International Workshop on Linked
     Media and the 3rd Developers Hackshop co-located with the 13th Extended Semantic Web
     Conference ESWC 2016, Heraklion, Crete, Greece, May 30, 2016. CEUR-WS.org (2016).
                                                                                               5


Fig. 1. Part I of the Smals KG Checklist. Red denotes the answers for one of our pilot projects.


Fig. 2. Part II of the Smals KG Checklist. Red denotes the answers for one of our pilot projects.

</pre>