<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Access &amp; Representation Layer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kevin Angele</string-name>
          <email>kevin.angele@sti2.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Umutcan Şimşek</string-name>
          <email>umutcan.simsek@sti2.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dieter Fensel</string-name>
          <email>dieter.fensel@sti2.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Knowledge Graph, Knowledge Access, Knowledge Access &amp; Representation Layer</institution>
          ,
          <addr-line>Knowledge Activators</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Onlim GmbH</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Semantic Technology Institute Innsbruck, University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Knowledge graphs integrate data from heterogeneous sources resulting in a very large set of statements to be stored and managed. Handling large amounts of data and supporting multiple use cases with probably conflicting requirements in a single knowledge graph is infeasible. To this end, we present our ongoing work on a “Knowledge Access &amp; Representation Layer” on top of a knowledge graph. With Knowledge Activators in its core, the layer reduces the size to operate on, supports conflicting requirements, and allows to integrate external data dynamically. We mainly present the specifications and tasks of a Knowledge Activator as the core of the layer.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Knowledge graphs integrate data from heterogeneous sources for powering intelligent
applications. At a specific size of knowledge graphs, operations (like error detection, duplicate
detection, or query answering) are hard to scale. Additionally, representing various points of
view having diferent (probably) conflicting requirements on the underlying data is infeasible
within a single knowledge graph. Besides, specific use cases require data from external services
for evaluating a single request which should be integrated on the fly. This results in three
main challenges: Handling the vast amount of statements (size), supporting various
(conflicting) points of view, and dynamically integrating external data. Those challenges significantly
influence generic applications designed to support multiple use cases.</p>
      <p>This paper presents our ongoing work on a layer called “Knowledge Access &amp; Representation
Layer” on top of knowledge graphs, operating on use-case-specific subgraphs (views). Those
views support diferent points of view and reduce the amount of data the operations need
to handle. Additionally, context-specific data is dynamically integrated without afecting the
underlying knowledge graph1. The main contribution of this paper is the introduction of
Knowledge Activators as the core of the layer and drawing feature directions for the implementation
of this layer.
CEUR
Workshop
Proceedings
1This will be part of the future work and is not addressed in this paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Exemplary Use Case</title>
      <p>An exemplary use case is the German Tourism Knowledge Graph (GTKG). The GTKG integrates
and curates data from all regional tourism marketing organizations in Germany, resulting in
the integration of 16 heterogeneous sources. Currently, the GTKG contains around 31K Events,
32K POIs, and 5K Tours accumulating to more than 23M statements2. The number of statements
proliferates with more regional marketing organizations integrating their data into the GTKG.</p>
      <p>Operations3 on the GTKG become slower the larger the knowledge graph gets (size challenge).
This can result in severe issues, as the operations must not interfere with a query answering
operation. Those operations further decrease the performance of the query answering tasks
and might cause temporary inconsistencies. Equally important, diferent regions have varying
constraints on the underlying data. Also, custom inference rules are used to infer
regionspecific knowledge ( various points of view challenge). Those (conflicting) points of view are not
representable within a single knowledge graph. Especially when contextual knowledge from
an application is needed, not necessarily belonging to the data in the overall knowledge graph.</p>
      <p>Let us consider two intelligent applications recommending vegan restaurants and restaurants
for meat-eaters. For the recommendations, each application requires a rule inferring a ranking
score used to show the top-ranked restaurants. The rule for the vegan application infers
a ranking score based on the variety of vegan dishes. Analogous, a restaurant’s variety of
meat dishes is essential for meat-eaters. Using both rules within a single knowledge graph is
impossible as they infer conflicting ranking scores. It might not be tragic for a meat-eater to get
a vegetarian recommendation, but the reverse situation must be avoided. Furthermore, both
intelligent applications serving information about restaurants only need a tiny amount of data
from the overall knowledge graph. When operating only on the relevant data, the number of
triples to be considered for reasoning can be reduced from 23M to a few hundred thousand.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Knowledge Access and Representation Layer</title>
      <p>The Knowledge Access &amp; Representation layer (see Figure 1) acts as a middle layer between
the applications consuming the knowledge graph and the knowledge graph itself. In this setup,
the knowledge graph is treated as a data lake allowing it to be erroneous and incomplete. For
the layer on top use-case-related subgraphs so-called views are extracted. Those views reduce
the size of data that needs to be considered for the operations and allow various points of view.
Knowledge Activators are at the core of the introduced layer operating on and storing those
views. A Knowledge Activator consists of a Micro TBox defining the terminology, constraints,
and rules and the subgraph definition used to extract the relevant subgraph from the knowledge
graph. Besides, Knowledge Activators allow integrating context-specific data from external
sources with the data contained in the view by using an External data integrator 4. The flow from
the applications to the Knowledge Activators is defined with the help of a control flow engine
and a data flow connector. Handling the communication between the Knowledge Activators
2The latest statistic can be found on: https://open-data-germany.org/datenbestand/ (last access: 13-05-2022)
3Currently focused on error detection and duplication detection.
4This will be part of the future work and is not addressed in this paper.
and the underlying knowledge graph is done by a graph database connector.</p>
      <p>This paper will focus on Knowledge Activators since they are the core of this layer.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Knowledge Activators</title>
      <p>Knowledge Activators operating on views are at the core of the Knowledge Access &amp;
Representation Layer. Extracting and hosting views, cleaning and enriching those, and integrating
external data are the tasks a Knowledge Activator handles. Figure 2 gives an overview of the
specifications and engines required to fulfill those tasks.</p>
      <p>The specifications are grouped into Micro TBox and Subgraph Definition and form together
with the extracted data a so-called view. A view is a use-case-specific subgraph with a context
(Micro TBox) built on top of it. Use-case-specific implies that only data relevant for a given
use-case is extracted from the underlying knowledge graph. A view reduces the size to operate
on and supports various (conflicting) points of view by defining a specific view for each use
case. After extracting the view from the knowledge graph, customizations can be applied to
adapt the view according to the given requirements of the use case.</p>
      <p>A Micro TBox contains the Terminology, Constraints, and Rules. The terminology defines
types, properties, and the type hierarchy used within the view, not necessarily aligned with
the underlying knowledge graph. It is possible to use completely diferent terminology, and
even a diferent knowledge representation formalism is possible. Besides, constraints define
specific requirements instances need to fulfill, and rules are used to infer new knowledge based
on existing facts.</p>
      <p>
        Subgraph Definition specifications are used for extracting the data for the view and are
defined by Knowledge Engineers. Therefore, Data Selection specifies the relevant data from the
underlying knowledge graph to be extracted (for example, by using GraphQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Mapping the
terminology of the underlying knowledge graph to the terminology used within the view is done
by a Data Mapping specification (e.g., using RML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). The subgraph definition specification is
used by the Data Extraction Engine for extracting the data (for initializing a Knowledge Activator
or on the fly).
      </p>
      <p>
        Then, the data needs to be cleaned and enriched after extracting from the underlying
knowledge graph because the knowledge graph can be erroneous and incomplete (we allow the
underlying knowledge graph to be a data lake). Cleaning the data in the view (Knowledge
Cleaning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) is about improving the correctness by identifying wrong assertions (called error
detection) and correcting those (called error correction). We focus on error detection by applying
integrity constraints to the data. Likewise, enriching the data in the view (Knowledge
Enrichment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) targets the completeness of a view by integrating external sources and identifying
duplicates the integration might cause. Furthermore, new knowledge can be inferred based on
the existing facts by using rules.
      </p>
      <p>An Error Detection Engine is used to identify wrong assertions using the terminology and
constraints defined in the Micro TBox. Erroneous statements can be divided into Syntactical
Errors, e.g., a URI contains whitespaces, and Semantic Errors where statements are not conform
to the (Micro) TBox, e.g., the value of a property a g e is a T e x t instead of a N u m b e r . A validation
report is produced by the Error Detection Engine containing all violations that need manual
ifxing.</p>
      <p>
        The Duplicate Detection task aims to increase the completeness of a view by introducing
lacking sameAs assertions between instances describing the same entity utilizing a Duplicate
Detection Engine. Identifying and resolving duplicates is challenging, and many methods and
techniques have been invented to tackle this issue [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the end, the Duplication Detection
engine provides a list of possible duplicates a Knowledge Engineer needs to check manually.
      </p>
      <p>Inferring new knowledge using the rules is handled by a Reasoning Engine. When evaluating
a request coming from an application, corresponding rules are evaluated, and inferred facts are
included in the response. Besides, the reasoning engine integrates data from external services
with the data from a view.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>This paper presented our ongoing work on the “Knowledge Access &amp; Representation Layer”,
allowing the underlying knowledge graph to be a vast, erroneous, and incomplete data lake.
For powering intelligent applications using the knowledge graph, Knowledge Activators extract
and host views, clean and enrich those, and cooperate with external data. This allows for
use-case-specific constraints and rules. Additionally, the amount of data to operate on is much
smaller, significant for the performance of the used engines.</p>
      <p>
        So far, a first version of the graph database connector [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the external data integrator [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
and the reasoning engine is implemented. For the error detection engine we further develop
VeriGraph [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and for deduplication as a service we further develop [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Furthermore, for
defining the data flow we will use Apache NiFi 5 and an adoption of the Corinthian Abstract
State Machine (CASM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for the control flow engine.
      </p>
      <p>Not addressed in this paper was the dynamic data integration. In the future, we will
conceptualize and implement the cooperation of external data with data from the views on the fly using
the external data integrator and the reasoning engine.</p>
      <p>In the next steps, we first finalize the conceptualization to cooperate external data with
data from the view. Then the existing implementations (Database Extraction, Duplication
Detection, Error Detection, External Data Integration, and Reasoning Engine) are composed into
the Knowledge Activators. Afterward, CASM will be adopted to fit our requirements for a
control flow engine. After implementing the Knowledge Access &amp; Representation Layer, an
extensive evaluation will be conducted to showcase the performance improvements when
operating on smaller subgraphs instead of the immense knowledge graph. In the end, the layer
is used on top of the GTKG to support various applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Angele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meitinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bußjäger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Föhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fensel</surname>
          </string-name>
          ,
          <article-title>Graphsparql: a graphql interface for linked data</article-title>
          ,
          <source>in: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>785</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>Rml: a generic language for integrated rdf mappings of heterogeneous data</article-title>
          ,
          <source>in: Ldow</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fensel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Şimşek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Angele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Huaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kärle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Panasiuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Toma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wahler</surname>
          </string-name>
          , Knowledge Graphs: Methodology, Tools and Selected Use Cases, Springer Nature,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Opdenplatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Huaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kärle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fensel</surname>
          </string-name>
          ,
          <article-title>Duplicate Detection as a Service (DDaaS)</article-title>
          ,
          <source>Technical Report D413y2</source>
          , MindLab Project,
          <year>2019</year>
          . URL: https://drive.google.com/ file/d/1UfWwBLoxLmcdRYLudxJs90lq5E80bMsk/view?usp=sharing.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kärle</surname>
          </string-name>
          , U. Şimşek,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gerrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Angele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fensel</surname>
          </string-name>
          ,
          <source>KARL SWS Integrator</source>
          ,
          <source>Technical Report D544y2</source>
          , MindLab Project,
          <year>2019</year>
          . URL: https://drive.google.com/file/d/ 1dxlVMvwiy9C8pn0IwJEQ6REltE-Qcy-M/view.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Angele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Holzknecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Huaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Panasiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Şimşek</surname>
          </string-name>
          , D. Fensel,
          <article-title>VeriGraph: A verification framework for Knowledge Integrity</article-title>
          ,
          <source>Technical Report D312y2</source>
          , MindLab Project,
          <year>2019</year>
          . URL: https://drive.google.com/file/d/1RudX-yt9JxomMb6OBCi4UD10vLtqWZBv/view.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lezuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Barany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krall</surname>
          </string-name>
          ,
          <article-title>Casm: Implementing an abstract state machine based programming language</article-title>
          ,
          <source>Software Engineering 2013-Workshopband</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>