<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards a Knowledge Access &amp; Representation Layer</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Kevin</forename><surname>Angele</surname></persName>
							<email>kevin.angele@sti2.at</email>
							<affiliation key="aff0">
								<orgName type="department">Semantic Technology Institute Innsbruck</orgName>
								<orgName type="institution">University of Innsbruck</orgName>
								<address>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Onlim GmbH</orgName>
								<address>
									<settlement>Vienna</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Umutcan</forename><surname>Şimşek</surname></persName>
							<email>umutcan.simsek@sti2.at</email>
							<affiliation key="aff0">
								<orgName type="department">Semantic Technology Institute Innsbruck</orgName>
								<orgName type="institution">University of Innsbruck</orgName>
								<address>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dieter</forename><surname>Fensel</surname></persName>
							<email>dieter.fensel@sti2.at</email>
							<affiliation key="aff0">
								<orgName type="department">Semantic Technology Institute Innsbruck</orgName>
								<orgName type="institution">University of Innsbruck</orgName>
								<address>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards a Knowledge Access &amp; Representation Layer</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BC795806090C5067EA62FBEBF0C08C30</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph</term>
					<term>Knowledge Access</term>
					<term>Knowledge Access &amp; Representation Layer</term>
					<term>Knowledge Activators</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Knowledge graphs integrate data from heterogeneous sources resulting in a very large set of statements to be stored and managed. Handling large amounts of data and supporting multiple use cases with probably conflicting requirements in a single knowledge graph is infeasible. To this end, we present our ongoing work on a "Knowledge Access &amp; Representation Layer" on top of a knowledge graph. With Knowledge Activators in its core, the layer reduces the size to operate on, supports conflicting requirements, and allows to integrate external data dynamically. We mainly present the specifications and tasks of a Knowledge Activator as the core of the layer.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Knowledge graphs integrate data from heterogeneous sources for powering intelligent applications. At a specific size of knowledge graphs, operations (like error detection, duplicate detection, or query answering) are hard to scale. Additionally, representing various points of view having different (probably) conflicting requirements on the underlying data is infeasible within a single knowledge graph. Besides, specific use cases require data from external services for evaluating a single request which should be integrated on the fly. This results in three main challenges: Handling the vast amount of statements (size), supporting various (conflicting) points of view, and dynamically integrating external data. Those challenges significantly influence generic applications designed to support multiple use cases.</p><p>This paper presents our ongoing work on a layer called "Knowledge Access &amp; Representation Layer" on top of knowledge graphs, operating on use-case-specific subgraphs (views). Those views support different points of view and reduce the amount of data the operations need to handle. Additionally, context-specific data is dynamically integrated without affecting the underlying knowledge graph 1 . The main contribution of this paper is the introduction of Knowledge Activators as the core of the layer and drawing feature directions for the implementation of this layer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Exemplary Use Case</head><p>An exemplary use case is the German Tourism Knowledge Graph (GTKG). The GTKG integrates and curates data from all regional tourism marketing organizations in Germany, resulting in the integration of 16 heterogeneous sources. Currently, the GTKG contains around 31K Events, 32K POIs, and 5K Tours accumulating to more than 23M statements<ref type="foot" target="#foot_0">2</ref> . The number of statements proliferates with more regional marketing organizations integrating their data into the GTKG.</p><p>Operations<ref type="foot" target="#foot_1">3</ref> on the GTKG become slower the larger the knowledge graph gets (size challenge). This can result in severe issues, as the operations must not interfere with a query answering operation. Those operations further decrease the performance of the query answering tasks and might cause temporary inconsistencies. Equally important, different regions have varying constraints on the underlying data. Also, custom inference rules are used to infer regionspecific knowledge (various points of view challenge). Those (conflicting) points of view are not representable within a single knowledge graph. Especially when contextual knowledge from an application is needed, not necessarily belonging to the data in the overall knowledge graph.</p><p>Let us consider two intelligent applications recommending vegan restaurants and restaurants for meat-eaters. For the recommendations, each application requires a rule inferring a ranking score used to show the top-ranked restaurants. The rule for the vegan application infers a ranking score based on the variety of vegan dishes. Analogous, a restaurant's variety of meat dishes is essential for meat-eaters. Using both rules within a single knowledge graph is impossible as they infer conflicting ranking scores. It might not be tragic for a meat-eater to get a vegetarian recommendation, but the reverse situation must be avoided. Furthermore, both intelligent applications serving information about restaurants only need a tiny amount of data from the overall knowledge graph. When operating only on the relevant data, the number of triples to be considered for reasoning can be reduced from 23M to a few hundred thousand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Knowledge Access and Representation Layer</head><p>The Knowledge Access &amp; Representation layer (see Figure <ref type="figure" target="#fig_0">1</ref>) acts as a middle layer between the applications consuming the knowledge graph and the knowledge graph itself. In this setup, the knowledge graph is treated as a data lake allowing it to be erroneous and incomplete. For the layer on top use-case-related subgraphs so-called views are extracted. Those views reduce the size of data that needs to be considered for the operations and allow various points of view. Knowledge Activators are at the core of the introduced layer operating on and storing those views. A Knowledge Activator consists of a Micro TBox defining the terminology, constraints, and rules and the subgraph definition used to extract the relevant subgraph from the knowledge graph. Besides, Knowledge Activators allow integrating context-specific data from external sources with the data contained in the view by using an External data integrator <ref type="foot" target="#foot_2">4</ref> . The flow from the applications to the Knowledge Activators is defined with the help of a control flow engine and a data flow connector. Handling the communication between the Knowledge Activators and the underlying knowledge graph is done by a graph database connector. This paper will focus on Knowledge Activators since they are the core of this layer.  The specifications are grouped into Micro TBox and Subgraph Definition and form together with the extracted data a so-called view. A view is a use-case-specific subgraph with a context (Micro TBox) built on top of it. Use-case-specific implies that only data relevant for a given use-case is extracted from the underlying knowledge graph. A view reduces the size to operate on and supports various (conflicting) points of view by defining a specific view for each use case. After extracting the view from the knowledge graph, customizations can be applied to adapt the view according to the given requirements of the use case.</p><p>A Micro TBox contains the Terminology, Constraints, and Rules. The terminology defines types, properties, and the type hierarchy used within the view, not necessarily aligned with the underlying knowledge graph. It is possible to use completely different terminology, and even a different knowledge representation formalism is possible. Besides, constraints define specific requirements instances need to fulfill, and rules are used to infer new knowledge based on existing facts.</p><p>Subgraph Definition specifications are used for extracting the data for the view and are defined by Knowledge Engineers. Therefore, Data Selection specifies the relevant data from the underlying knowledge graph to be extracted (for example, by using GraphQL <ref type="bibr" target="#b0">[1]</ref>). Mapping the terminology of the underlying knowledge graph to the terminology used within the view is done by a Data Mapping specification (e.g., using RML <ref type="bibr" target="#b1">[2]</ref>). The subgraph definition specification is used by the Data Extraction Engine for extracting the data (for initializing a Knowledge Activator or on the fly).</p><p>Then, the data needs to be cleaned and enriched after extracting from the underlying knowledge graph because the knowledge graph can be erroneous and incomplete (we allow the underlying knowledge graph to be a data lake). Cleaning the data in the view (Knowledge Cleaning <ref type="bibr" target="#b2">[3]</ref>) is about improving the correctness by identifying wrong assertions (called error detection) and correcting those (called error correction). We focus on error detection by applying integrity constraints to the data. Likewise, enriching the data in the view (Knowledge Enrichment <ref type="bibr" target="#b2">[3]</ref>) targets the completeness of a view by integrating external sources and identifying duplicates the integration might cause. Furthermore, new knowledge can be inferred based on the existing facts by using rules.</p><p>An Error Detection Engine is used to identify wrong assertions using the terminology and constraints defined in the Micro TBox. Erroneous statements can be divided into Syntactical Errors, e.g., a URI contains whitespaces, and Semantic Errors where statements are not conform to the (Micro) TBox, e.g., the value of a property a g e is a T e x t instead of a N u m b e r . A validation report is produced by the Error Detection Engine containing all violations that need manual fixing.</p><p>The Duplicate Detection task aims to increase the completeness of a view by introducing lacking sameAs assertions between instances describing the same entity utilizing a Duplicate Detection Engine. Identifying and resolving duplicates is challenging, and many methods and techniques have been invented to tackle this issue <ref type="bibr" target="#b3">[4]</ref>. In the end, the Duplication Detection engine provides a list of possible duplicates a Knowledge Engineer needs to check manually.</p><p>Inferring new knowledge using the rules is handled by a Reasoning Engine. When evaluating a request coming from an application, corresponding rules are evaluated, and inferred facts are included in the response. Besides, the reasoning engine integrates data from external services with the data from a view.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>This paper presented our ongoing work on the "Knowledge Access &amp; Representation Layer", allowing the underlying knowledge graph to be a vast, erroneous, and incomplete data lake. For powering intelligent applications using the knowledge graph, Knowledge Activators extract and host views, clean and enrich those, and cooperate with external data. This allows for use-case-specific constraints and rules. Additionally, the amount of data to operate on is much smaller, significant for the performance of the used engines.</p><p>So far, a first version of the graph database connector <ref type="bibr" target="#b0">[1]</ref>, the external data integrator <ref type="bibr" target="#b4">[5]</ref>, and the reasoning engine is implemented. For the error detection engine we further develop VeriGraph <ref type="bibr" target="#b5">[6]</ref> and for deduplication as a service we further develop <ref type="bibr" target="#b3">[4]</ref>. Furthermore, for defining the data flow we will use Apache NiFi<ref type="foot" target="#foot_3">5</ref> and an adoption of the Corinthian Abstract State Machine (CASM) <ref type="bibr" target="#b6">[7]</ref> for the control flow engine.</p><p>Not addressed in this paper was the dynamic data integration. In the future, we will conceptualize and implement the cooperation of external data with data from the views on the fly using the external data integrator and the reasoning engine.</p><p>In the next steps, we first finalize the conceptualization to cooperate external data with data from the view. Then the existing implementations (Database Extraction, Duplication Detection, Error Detection, External Data Integration, and Reasoning Engine) are composed into the Knowledge Activators. Afterward, CASM will be adopted to fit our requirements for a control flow engine. After implementing the Knowledge Access &amp; Representation Layer, an extensive evaluation will be conducted to showcase the performance improvements when operating on smaller subgraphs instead of the immense knowledge graph. In the end, the layer is used on top of the GTKG to support various applications.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the Knowledge Access and Representation Layer</figDesc><graphic coords="3,112.27,123.07,370.73,152.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Specifications and Engines composing a Knowledge Activator</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">The latest statistic can be found on: https://open-data-germany.org/datenbestand/ (last access: 13-05-2022)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Currently focused on error detection and duplication detection.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">This will be part of the future work and is not addressed in this paper.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://nifi.apache.org/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Graphsparql: a graphql interface for linked data</title>
		<author>
			<persName><forename type="first">K</forename><surname>Angele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Meitinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bußjäger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Föhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fensel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing</title>
				<meeting>the 37th ACM/SIGAPP Symposium on Applied Computing</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="778" to="785" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
		<title level="m">Rml: a generic language for integrated rdf mappings of heterogeneous data</title>
				<imprint>
			<publisher>Ldow</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Fensel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Şimşek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Angele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Huaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kärle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Panasiuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Toma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Umbrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wahler</surname></persName>
		</author>
		<title level="m">Knowledge Graphs: Methodology, Tools and Selected Use Cases</title>
				<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Duplicate Detection as a Service (DDaaS)</title>
		<author>
			<persName><forename type="first">J</forename><surname>Opdenplatz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Huaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kärle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Umbrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fensel</surname></persName>
		</author>
		<idno>D413y2</idno>
		<ptr target="https://drive.google.com/file/d/1UfWwBLoxLmcdRYLudxJs90lq5E80bMsk/view?usp=sharing" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>MindLab Project</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Kärle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Şimşek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gerrier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Angele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fensel</surname></persName>
		</author>
		<idno>D544y2</idno>
		<ptr target="https://drive.google.com/file/d/1dxlVMvwiy9C8pn0IwJEQ6REltE-Qcy-M/view" />
		<title level="m">KARL SWS Integrator</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>MindLab Project</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Angele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Holzknecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Huaman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Panasiuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Şimşek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fensel</surname></persName>
		</author>
		<idno>D312y2</idno>
		<ptr target="https://drive.google.com/file/d/1RudX-yt9JxomMb6OBCi4UD10vLtqWZBv/view" />
		<title level="m">VeriGraph: A verification framework for Knowledge Integrity</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>MindLab Project</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Casm: Implementing an abstract state machine based programming language</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lezuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Barany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krall</surname></persName>
		</author>
		<idno>2013</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>-Workshopband</publisher>
		</imprint>
	</monogr>
	<note type="report_type">Software Engineering</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
