<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Do you need a knowledge graph? Helping organizations determine whether a knowledge graph is needed for their problems with the Smals KG Checklist</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christophe Debruyne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katy Fokou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Stijfhals</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Smals Research</institution>
          ,
          <addr-line>Smals, Avenue Fonsny 20, 1060 Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>A knowledge graph (KG) is a graph that needs to fulfill specific criteria and is key in unlocking data siloes and tacit information for innovative applications. However, organizations may find it hard to identify use cases for KGs and even assessing whether a KGs is a viable approach for a particular problem. Within Smals Research, we designed the Smals KG Checklist, which guides a group of stakeholders in determining whether KG technologies can be used to address a concrete problem. The checklist is to be used in a workshop environment where a facilitator guides the stakeholders in filling the checklist.</p>
      </abstract>
      <kwd-group>
        <kwd>knowledge graph engineering</kwd>
        <kwd>project planning</kwd>
        <kwd>project management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Smals is a non-profit organization that realizes innovative ICT projects for mostly
Belgian (semi-)government departments and other affiliated members. Within Smals, the
Smals Research† department investigates opportunities for using, deploying, and
developing skills and know-how for emerging and innovative technologies for its members.
One of these technologies, amongst many others, is knowledge graphs (KGs) [1].</p>
      <p>It has become apparent that business analysts‡ and members (from now on called
stakeholders) have difficulty understanding the concept of KGs. Stakeholders do have
a good understanding of the problems they face, but do not know how these problems
can be solved using KGs. Anecdotally, for instance, stakeholders have shown interest
in recommender systems to propose related information, whereas it turned out to be
“simply” information about the same entity from two different silos—i.e., a data
integration problem. Stakeholders are familiar with graph databases as they have been
adopted to solve specific problems such as graph analytics. Still, these graphs cannot
Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
† https://www.smalsresearch.be/
‡ Smals employees that are working closely with its members to comprehend their business needs,
translating those into ICT projects, and follow up and manage the realization of these projects.
(yet) be regarded as KGs, which adds to the confusion.</p>
      <p>To help stakeholders understand the concept of a KG and, more importantly, to assist
them in assessing whether a KG is a viable approach in tackling concrete problems, we
have designed the Smals KG Checklist§. The checklist is to be used in a collaborative
setting such as a workshop in which stakeholders provide input. The checklist acts as a
guide for the workshop facilitator, and it is up to the facilitator to capture and refine the
stakeholder’s input.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Graphs vs. Knowledge Graphs</title>
      <p>In [1], the authors analyzed various definitions of the term KG. While there is no
consensus, [1] allows us to summarize that a KG is a graph (representing entities and their
relationships) that fulfills three criteria: C1) the KG has a non-trivial schema or
ontology; C2) the KG integrates information from heterogeneous sources; and C3) the KG
is used to gain insights by inferring implicit information from explicit information
(either via the schema, machine learning, or tooling on top of the KG).</p>
      <p>KGs are typically stored in graph databases, but graph databases are also used in
other scenarios. Graph databases are necessary for graph analytics and may solve issues
stemming from a relational database’s computational limitations. An example of the
latter is Neo4j’s use case on access and identity management. Neo4j reported on the
migration of a relational database to a graph database to avoid expensive recursive
joins.** As the data was merely migrated and not all criteria were met, we argue that
this project did not yield a KG.</p>
      <p>How can we determine whether a project needs a graph or a KG? To determine
whether a particular problem can be solved with a KG, we need to determine whether
a solution requires the three criteria to be met.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>KG technology vendors often publish whitepapers and blog posts on the successful
application of KG. Even organizations report on their use cases and lessons learned (e.g.,
[2]). While valuable, not all organizations are faced with the same problems and
organizations do not necessarily have the expertise to extrapolate those examples to similar
cases. There are also resources aimed at both academia and industry (e.g., [3]). While
they also provide valuable information on everything involved in a KG project
(activities, methodologies, techniques, etc.) and examples, they do not provide a tool such as
our checklist to determine the applicability of KG technologies for a problem. The
Smals KG Checklist, which we present in this paper, thus addressed this critical gap.</p>
      <p>We designed a tool that can be used in a workshop setting, much like a business
model canvas or an ethics canvas [4]. As we start from concrete problems and the
questions we needed to be answered are well-scoped, we did not pattern our checklist after
these canvases.
§ Made available with a CC BY-NC-SA 4.0 via
https://www.smalsresearch.be/wp-content/uploads/2021/06/smals-kg-checklist.pdf
** https://neo4j.com/blog/enterprise-identity-access-management/, last accessed May 25th, 2021</p>
    </sec>
    <sec id="sec-4">
      <title>The Smals KG Checklist</title>
      <p>The checklist consists of two parts (see Fig. 1 and Fig. 2). In Part I, we first identify the
problem, stakeholders, and core concepts. Then we aim to answer three questions by
filling in Part II (Fig. 2). These three questions are related to the criteria of a KG
mentioned in the previous section. The final question on Part I is used to identify future
opportunities for the KG, or its applicability in the longer term. Once filled in, and
refined over time, the checklist can be used to determine whether KG technologies are
needed to solve a particular problem. The questions on KG criteria are given a color,
and these colors reappear in the sections of the second part: purple corresponds with
C1, green with C2, and orange with C3. These sections are key in determining whether
all criteria are (or have to be) met:
•
•
•</p>
      <p>Section I is used to list the sources that will inform our schema. Section I has two
colors as the integration of an existing database could appear in both the
bottomup integration of structured data (Section II in Fig. 2) and as input for the KG’s
schema by lifting its database schema.</p>
      <p>Sections II and IV are concerned with integrating structured and unstructured data,
respectively. The integration of provenance information, metadata,
annotations,…of data mentioned in Sections II and IV is captured in Section III, and
therefore placed between Sections II and IV. We have noticed that it helps to ask
this question explicitly.</p>
      <p>Sections V and VI are respectively concerned with symbolic reasoning (e.g., OWL
reasoning) and machine learning. As reasoning over the KG requires an ontology,
there is an arrow from Section II to Section I. Reasoning and AI techniques are
clear indications of gathering insights from the KG and are therefore orange. We
can argue that retrieving information according to Linked Data principles does not
necessarily help one gain insights, but visualization tools (e.g., [5, 6]) and faceted
browsing (e.g., [7, 8]), amongst others, may. This is why Section VII, concerned
with applications on top of the KG, has a gradient fill instead of a complete fill.
We deem the criteria w.r.t. to Section VII fulfilled when the need for such tools to
gather insights is mentioned.</p>
      <p>We use one of Smals Research’s KG projects with the Belgian Social Security to
illustrate the checklist by filling in the forms. This project aims to integrate the data of
three databases to which inspectors have access but are unable to query as a whole.
Integrating the data into a KG requires an ontology and would greatly facilitate their
work by allowing them to gain insights in, for instance, one’s employment histories.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Summary</title>
      <p>It is difficult for organizations to identify when KGs technologies are a viable means to
an end. There are many valuable resources on KGs available, but they either focus on
constructing KGs or report on KG use cases that organizations need to extrapolate. To
help organizations determine whether a KG will help tackle a concrete problem, we
designed the Smals KG Checklist. The tool is meant to be filled by a group of
stakeholders, with a facilitator taking the lead and guiding the discussion.</p>
      <p>Disclaimer. The views and opinions expressed in this paper are those of the authors and
do not express the views or opinions of Smals. Acknowledgements. We thank Gunther
Hellebaut, Karel Van Eeckhoutte, and Catherine Vanden Daelen for their feedback.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          CoRR. abs/
          <year>2003</year>
          .0, (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Hubauer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamparter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herzig</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          :
          <article-title>Use Cases of the Industrial Knowledge Graph at Siemens</article-title>
          . In: van Erp,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Atre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>Fortuna</surname>
          </string-name>
          , C. (eds.)
          <source>Proceedings of the ISWC</source>
          <year>2018</year>
          <article-title>Posters &amp; Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC</article-title>
          <year>2018</year>
          ), Monterey, USA, October 8th - to - 12th,
          <year>2018</year>
          . CEUR-WS.org (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetere</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gómez-Pérez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , H. eds:
          <article-title>Exploiting Linked Data and Knowledge Graphs in Large Organisations</article-title>
          . Springer (
          <year>2017</year>
          ). https://doi.org/10.1007/978- 3-
          <fpage>319</fpage>
          -45654-6.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Reijers</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koidl</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pandit</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordijn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Discussing Ethical Impacts in Research and Innovation: The Ethics Canvas</article-title>
          . In: Kreps,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Ess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Leenen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , and
            <surname>Kimppa</surname>
          </string-name>
          , K. (eds.)
          <source>This Changes Everything - ICT and Climate Change: What Can We Do? - 13th IFIP TC 9 International Conference on Human Choice and Computers, HCC13</source>
          <year>2018</year>
          ,
          <article-title>Held at the 24th IFIP World Computer Congress</article-title>
          , WCC 2018, Poznan, Poland,
          <source>September 19-21</source>
          ,
          <year>2018</year>
          , Proceedi. pp.
          <fpage>299</fpage>
          -
          <lpage>313</lpage>
          . Springer (
          <year>2018</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          - 99605-9_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Mouromtsev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emelyanov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morozov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razdyakonov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galkin</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Simple Web-based Tool for Visualization and Sharing of Semantic Data and Ontologies</article-title>
          .
          <source>In: Proceedings of the ISWC</source>
          <year>2015</year>
          <article-title>Posters &amp; Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-</article-title>
          <year>2015</year>
          ), Bethlehem, PA, USA, October
          <volume>11</volume>
          ,
          <year>2015</year>
          . CEUR-WS.org (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
            : Garijo,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lawrynowicz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds.)
          <article-title>Proceedings of the EKAW 2020 Posters and Demonstrations Session co-located with 22nd International Conference on Knowledge Engineering and Knowledge Management (EKAW</article-title>
          <year>2020</year>
          ),
          <article-title>Globally online</article-title>
          &amp; BozenBolzano, Italy,
          <year>September 17</year>
          ,
          <year>2020</year>
          . pp.
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          . CEUR-WS.org (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomelli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherkhonov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
          </string-name>
          , I.: SemFacet.
          <source>In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17</source>
          . pp.
          <fpage>2475</fpage>
          -
          <lpage>2478</lpage>
          . ACM Press, New York, New York, USA (
          <year>2017</year>
          ). https://doi.org/10.1145/3132847.3133192.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Koho</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyvönen</surname>
          </string-name>
          , E.: SPARQL Faceter -
          <article-title>Client-side Faceted Search Based on SPARQL</article-title>
          . In: Troncy,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.J.B.</given-names>
            ,
            <surname>Kurz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>Sande</surname>
          </string-name>
          , M. Vander (eds.)
          <source>Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop co-located with the 13th Extended Semantic Web Conference ESWC</source>
          <year>2016</year>
          , Heraklion, Crete, Greece, May
          <volume>30</volume>
          ,
          <year>2016</year>
          . CEUR-WS.org (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>