<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Graph Injection for Reinforcement Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Wardenga</string-name>
          <email>wardenga@infai.org</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liubov Kovriguina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitrii Pliukhin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniil Radyush</string-name>
          <email>daniil.radyush@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Smoliakov</string-name>
          <email>smol.ivan97@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuan Xue</string-name>
          <email>xue@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henrik Müller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksei Pismerov</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Mouromtsev</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Kudenko</string-name>
          <email>kudenko@l3s.de</email>
          <email>lk@metaphacts.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Forschungszentrum L3S</institution>
          ,
          <addr-line>Appelstraße 9a 30167 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer IAIS Dresden</institution>
          ,
          <addr-line>Schloss Birlinghoven 1, 53757, Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>Kronverksky Pr. 49, bldg. A, St. Petersburg, 197101</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institut für Angewandte Informatik (InfAI) e. V.</institution>
          ,
          <addr-line>Goerdelerring 9 04109; Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>TIB - Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek</institution>
          ,
          <addr-line>Welfengarten 1B, 30167 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>metaphacts GmbH</institution>
          ,
          <addr-line>Daimlerstraße 36, 69190, Walldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In reinforcement learning (RL) an agent usually learns the specifics and rules of the environment via interaction. This limits the agent in taking the best action only from the current observation and past experience. Therefore, providing relevant external knowledge for RL agents, as well as incorporating learned knowledge in the RL process can be a critical part of agent's successful training in real-world tasks. We propose a method, an architecture and experimental results for injecting expert knowledge in the form of RDF knowledge graphs (KGs) into the RL processes, showing how knowledge consumption increases sample eficiency. Furthermore, we investigate the scalability of our approach concerning the complexity of the underlying task showing injection of KGs is beneficial to the solution of more complex RL tasks. For experimental evaluation we used the Minigrid environment, which is a standard benchmark for RL. For this environment, we designed an ontology and generated a KG, that promotes reusability and interoperability across heterogeneous data of the environment. We show that adding knowledge to the agent's learning process improves sample eficiency and the benefits increase with the complexity of the environment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Reinforcement Learning</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Knowledge Injection</kwd>
        <kwd>State Representation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recently, RL has become increasingly popular as an approach for solving a variety of problems
in numerous fields. These problems include, for example, training foundation models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], dialog
management, healthcare, personalized suggestions and recommendations, gaming, self-driving
cars, and many other applications in diferent domains. This success is a result of the progress of
Workshop at ISWC 2023 on DEEP LEARNING FOR KNOWLEDGE GRAPHS
* Corresponding author.
such methods as DQN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Alpha-GO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], AlphaZero [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and others. These methods eficiently
generate abstractions and learn policies in the tasks with large state spaces.
      </p>
      <p>At the same time, some tasks remain challenging for RL, when an agent does not have the
opportunity to obtain in advance all the necessary knowledge about the specifics of the
environment with which he has to interact and must adapt to changing conditions and somehow decide
which of the knowledge, relevant to the environment, to use. Moreover, the provision,
integration, and use of external knowledge for agents is also a challenge for knowledge engineering,
when encoding of observations about the environment can change dynamically. Sometimes, the
domain (environment) is too complex to be eficiently represented by standard RL approaches.
Knowledge graphs serve as a rich representation model for knowledge, that can be expressed
in terms of entities and relations, as well as larger frame structures, i.e. events. However, for
complex or unseen tasks there is often no clear idea, how the knowledge should be consumed or
what are the most relevant knowledge snippets for a downstream task (for example, when long
term reasoning is required). The area of RL, on the contrary, brings the opposite: approaches to
model behavior and decision-making process towards solving tasks via interaction with the
environment. The interactive nature of RL allows to self-learn with provided knowledge and
integrate various data sources without tedious data annotation and curation processes and get
better results on complex knowledge-seeking tasks. The motivation of the current paper is to
provide theoretical and practical ground to large scale bridging of recent advances in Semantic
Web technologies and knowledge graph representations with RL pipelines.</p>
      <p>
        This research setup looks a priori promising, because knowledge injection in deep neural
networks has been shown to be a powerful paradigm to add external knowledge to neural
architectures, allowing to ease the learning process and enabling them to reason better in
downstream tasks[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, knowledge in such scenarios has been understood in a
very broad, if not say, vague, sense: expert notes, tables, data snippets, retrieved from databases,
etc. In the area of language models, knowledge graph injection turned out to be especially
beneficial, although the questions, which knowledge should be injected, and when, as well
as the problem of heterogeneous vector spaces, are still not fully solved. We are tackling the
problem of knowledge graph injection in RL pipelines trying to mitigate the challenges both in
knowledge graphs and RL and opening a broad path for bridging RL and Semantic Web, where
agents can improve their learning process from the existing knowledge graphs, and the latter
can be refined from the interaction process of agent and environment (similar as in RLHF[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
      <p>
        Learning of RL agents derives from interaction with the environment, in which basic RL
algorithms, such as -learning, generally require visiting a large number of available states
many times before a satisfactory policy, or map from states to actions, can be achieved. Learning
problems besides the "curse of dimensionality"[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] are also induced by the incompleteness and
variability of knowledge about the environment in real-world applications. This limitations of
the state space representations in RL can be tackled with KGs, which provide rich and expressive
ways to represent and analyze weakly structured domain data and exploring domain structure.
      </p>
      <p>
        In this article, we propose a new approach to augment observations with relevant knowledge.
The knowledge is provided as domain KG embeddings, contextualized with the agent’s state.
This approach works independently of the underlying specific deep RL method. We evaluate
our approach by extending the vanilla DQN algorithm with knowledge injection component
(Sec. 4) and show performance improvements across 7 Minigrid environments [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] of diferent
complexity (see Sec. 5.2).
      </p>
      <p>The key contributions of the paper can be summarized as follows:
• KGRL, a method and an architecture for on-the-fly retrieving and injecting relevant
knowledge from RDF KGs to augment RL pipelines.</p>
      <p>testing ground for KG injection1.
• Open source Minigrid ontology and an associated KG generator, that may serve as a
• Experimental results, demonstrating that knowledge injection reduces the cost of learning
new policies, especially for more complex environments.</p>
      <p>The paper is structured as follows: the introduction is followed by preliminaries in Section 2
before discussing related work in Section 3, continued with a detailed description of the KGRL
approach in Section 4 and experimental setup and results in Section 5, finishing with conclusions
and future work in Section 6. The Appendix to the paper contains descriptions of the Minigrid
environments with its complexity classes, and learning cureves, and Appendix in the repository
provides information on the experimental setup, hyperparameter tuning, training runs, KGRL
adaptation to other domains, Minigrid ontology visualization, and a list of abbreviations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>In our work we consider scenarios that extend the standard RL architecture with the components
required to support the agent with relevant knowledge, retrieved from knowledge graphs
contextualized with the current observation. The preliminaries respectively cover the basic
concepts of the approaches and methods, that were reused or extended in KGRL, highlighting
how they can benefit from each other.</p>
      <sec id="sec-2-1">
        <title>2.1. Reinforcement Learning and Deep Q-Learning</title>
        <p>
          Reinforcement learning involves an agent interacting with an environment and making sequence
of action decisions. An RL task can be modeled as a Markov Decision Process (MDP)  =
(, , , ,  ), a tuple consisting of a state space , an action space , the transition dynamics
 (, , ′) :  ×  ×  → [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], the reward function (, , ′) ∈ R, and a discounting factor
0 &lt;  ≤
        </p>
        <p>1.</p>
        <p>The goal of an RL agent is to learn the optimal policy which maximizes the expected discounted
reward by interacting with the environment. Given an arbitrary policy  , action-value function
 (, ) indicates the expected discounted return of executing action  in the current state
 and following the policy  afterwards. On the other hand, given an action-value function
(, ), a greedy policy can be induced via selecting the action with the highest Q-value.</p>
        <p>
          Q-learning [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a classic model-free RL algorithm, finds the optimal policy  * by learning
the optimal Q-function
∞
=0

* (, ) = max E[∑︁  + |  = ,  = ,  ]
1We provide a repository containing additional results ablations and training runs, the ontologies for the Minigrid
environment, code to generate KGs from Minigrid environments, code and a database of hyperparameters to
perform all experiments as well as detailed descriptions on reproduction - https://github.com/wardengainfai/KGRL
+1(, ) = (1 −  )(, ) + 
 (, , +1) +  max  (︀ +1, ′)︀
′
︂]
The Q-function is iteratively updated according to Bellman Optimality equation:
Deep Q-learning (DQN) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] approximates the Q-function with a deep neural network that is
characterized by its weights . The objective function of DQN to minimize is:
︂[
Δ = E(,,′,)∼ [ +  max ^(′, ′) − (, )]
where  is the experience replay bufer, which caches recent transition history and helps
improve sample eficiency of learning, i.e. the agent can achieve a higher reward with the same
amount of interactions or samples from the environment. ^ is the target Q-network, which
copies the weights of  by each fixed interval of time steps. This technique has been proven
to be critical for the stability of training performance.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontologies, Knowledge Graphs and Knowledge Graph Embeddings</title>
        <sec id="sec-2-2-1">
          <title>Ontologies</title>
          <p>
            are explicit formal specifications of the terms in the domain and relations among
them[
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], which provide humans and machines an accurately understandable context or
meaning and ensure a common understanding of information2. The ontology modeling languages (i.e.
OWL, an ontology language for the Web, based on description logic[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]) are characterized by
formal semantics, and include description of concepts in a domain of discourse (classes
(sometimes called concepts)), properties of each concept describing various features and attributes of
the concept (slots, sometimes called roles or properties), and restrictions on slots[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. OWL
has rich and expressive syntax, allowing to describe quite complex abstractions, i.e. events and
temporal relations.
          </p>
          <p>
            Knowledge graphs represent structured collections of facts describing the world in the form
of relationships between entities[
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. KGs is a popular model for representing static,
graphstructured data, but they do not ofer an efective means to learn and discover processes, like RL
does. Formally, a KG is a graph-based representation model, which nodes represent entities and
edges represent relations between these entities[
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], shaped as a set of triples: "for a given set
of entities ℰ and set of relations ℛ
, we consider a knowledge graph  ⊆
          </p>
          <p>
            K = ℰ × ℛ × ℰ
directed, multi-relational graph that comprises triples (ℎ, , ) ∈  in which ℎ,  ∈ ℰ represent
a triples’ respective head and tail entities and  ∈ ℛ represents its relationship."[
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
          <p>
            Among the chief benefits of OWL/RDF KGs is their formal representation richness, reasoning
as a
capabilities, large inventory of validation and visualization tools, reusability, and inherent
capability for flexible evolving and lifelong population. Representation richness, established
methods for incompleteness handling and never ending population of knowledge graphs can be
considered as most beneficial for the RL applications. Moreover, a well-elaborated pool of KG
embedding algorithms has opened frontiers for its injection into (large) language models and
other NLP and ML pipelines[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ].
          </p>
          <p>
            Knowledge Graph Embedding Models (KGEMs) learn latent representations for entities
 ∈ ℰ and relations  ∈ ℛ in a KG that best preserve its structural properties[
            <xref ref-type="bibr" rid="ref16">16</xref>
            ],[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ] within
an Embedding space X (often X = R with embedding dimension ). Authors in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]
2https://cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/owl-101/
define KGEM as four components: an interaction model, a training approach, a loss function, and
its usage of explicit inverse relations. An interaction model  : ℰ × ℛ × ℰ → R computes a
real-valued score representing the plausibility of a triple (ℎ, , ) ∈ K given the embeddings for
the entities and relations. In general, a larger score indicates a higher plausibility of a triple, but,
being model-dependent, it cannot be directly interpreted as a probability. The pool of KGEMs
is quite diverse: translation models, tensor factorization models, semantic matching models,
hyperbolic embeddings[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ].
          </p>
          <p>
            In KGRL, we start with the most general and adopted translation models (TransE, TransR,
RotateE, etc.), which show good performance across tasks. TransE[
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] is an energy-based model
for learning low-dimensional embeddings of entities, that models relations as translations of
head to tail embeddings, i.e. ℎ +  ≈ . Following the notation in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], the interaction model
is defined as:  (ℎ, , ) = −‖ ℎ +  − ‖, with  ∈ {1, 2} is a hyper-parameter. Despite its
simplicity, TransE is known for computational eficiency and good performance for modeling
most kinds of relationships. However, it inherently cannot model 1-N, N-1, and N-M relations.
          </p>
          <p>
            Linking Text and Knowledge Graphs Entity and relation linking (ER), which task lies in
grounding text fragments to entities and relations of a specific KG[
            <xref ref-type="bibr" rid="ref21">21</xref>
            ],[
            <xref ref-type="bibr" rid="ref22">22</xref>
            ], serves as a bridge
between unstructured text and KGs. Constant evolving of KGs and language require entity and
relation linking algorithms to be flexible and adaptive, able to handle out-of-vocabulary and
rare cases. ER linking is a crucial part of most approaches to knowledge injection in language
models (LMs). Knowledge injection aims to augment LMs with missing knowledge, improving
LMs awareness about the missing facts and mitigating its incompleteness. Multiple interesting
approaches have appeared for this task, employing joint training, vector space alignment,
alignment on text level, and other methods[
            <xref ref-type="bibr" rid="ref6">6</xref>
            ],[
            <xref ref-type="bibr" rid="ref23">23</xref>
            ],[
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]. This direction in combination with RL
opens new frontiers in training more knowledge-aware models, handling heterogeneous data.
          </p>
          <p>
            In multimodal KGs (i.e. combining text, image, geo-spatial data [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ]) one of the modalities
can often serve as the interlinking layer for the others (i.e., speech and image can be interlinked
via text modality). This has been explored deeply for entity and relation extraction and linking
from text but also for linking images with KGs [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]. Advances in multimodal ER linking can be
especially helpful to ground natural language tasks in heterogenous RL environments, allowing
linking of multimodal environment observations to KG.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        Provision of knowledge to an RL agent has been studied from diferent perspectives. Some
approaches concentrate on keeping the knowledge of the past experience (i.e., from interaction
with the textual environment in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]) or environment topology in the agent’s memory. For
instance, Humphreys et al. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] propose an approach in which agents can utilise large-scale
context sensitive database lookups to support their parametric computations, concluding that
large scale retrieval leads to performance boost.
      </p>
      <p>
        Xue et al.[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] treat the Markov Decision Process (MDP) as a graph and employ graph learning
methods to encode knowledge about its topology into an embedding space, upon which an
abstraction of the state space is generated via clustering methods. An abstract MDP (AMDP) is
then constructed, which can be solved exactly using dynamic programming. The resulting value
function is used as a source of reward shaping for solving the original MDP, which helps to
improve the sample eficiency of the RL agent. This method provide a pipeline to automatically
generate an AMDP to help speed up RL. However, constructing and solving the AMDP needs to
be done before the real RL policy optimization starts and that introduce additional overhead
to RL learning. dynamic programming does not scale up well. Burden and Kudenko [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] also
seek to construct an abstract MDP to induce reward shaping. The abstraction is generated by
uniformly partitioning the state space. The cost of building AMDP is reduced but topology of
the state space is ignored, which can lead to misleading reward shaping.
      </p>
      <p>
        In KG-DQN [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] the authors concern themselves with solving text-based games, compounding
local perceptual knowledge in a KG. The method proposed in [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] utilises a commonsense KG,
that exhibits certain similarity to our approach. While [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] also adds commonsense knowledge
via linking, [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] approaches use GNNs to feed the obtained knowledge to a
specific RL-model. In contrast, our approach uses KGEs and provides knowledge in a RL-model
agnostic way, that allows to integrate knowledge into a wide range of RL methods easily. The
closest approach to KGRL was proposed for recommendation systems in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], where authors use
prior knowledge to enrich the representation of items and user states and guide the candidate
selection for better candidate item retrieval. However, the proposed KGRL architecture has
a number of benefits in comparison to that paper. KGRL allows broader interaction between
modalities (image, text, etc.) with less training and is easier for configuring, being implemented
in “lego” style: formats and technologies choice allow to joggle with diferent RL methods,
LLMs, KGEMs and KGs). Furthermore, we investigate the scaling w.r.t. the complexity of the
underlying task.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <sec id="sec-4-1">
        <title>4.1. Overview</title>
        <p>The proposed KGRL approach aims to augment observation vectors with KG embeddings,
retrieved from the domain KGEM and external LLMs based on the current agent’s state, that are
injected during the learning process. This approach works independently of the underlying deep
RL method and is applicable to any RL method, where state and observation can be represented
as vectors.</p>
        <p>The approach performs step-wise knowledge injection to the RL model at the observation
level using a KG wrapper. Within the KG wrapper there are two main types of components:
linkers, which ground non-structured data in the observation to the KG snippets (subgraph),
and retrievers, which retrieve relevant embeddings for the input subgraph. The process goes
as follows: at each time step the observation is split by data type, which it contains, and these
parts of observation are augmented with knowledge snippets by linkers:
• text data are processed by entity and relation linker, which returns a list of relevant triples
of the KG,
• image data, i.e. describing the view of the agent, are processed by object detection linker,
which also returns a list of entities,
• additional state representations, that can contain heterogeneous data, and may require
specific environment-dependent linkers, i.e. geo-spatial linker, bio-chemical linker.
These extracted knowledge snippets are routed to the knowledge retriever, which retrieves
KGEs, relevant to these snippets (Sec. 4.2.2). Thus, at each iteration the agent observes only
the knowledge relevant to its state. The KGRL architecture is shown in Figure 1 for the Deep
-Network.</p>
        <p>KG
Wrapper</p>
        <p>KG
LM</p>
        <p>Features Extractor
KG Retrievers</p>
        <p>KG Linkers
NLU Vision Sym.</p>
        <p>observation</p>
        <p>Ot
reward</p>
        <p>Rt</p>
        <p>Rt+1
Ocopyt+1</p>
        <p>Oaddt+1
Otextt+1</p>
        <p>Oimaget+1</p>
        <p>Agent
describes the knowledge injection process for a general RL algorithm, that operates with states,
observations and rewards.  - text data,  - heterogeneous data, ℎ - KGEs and RDF triples,
 - image data.</p>
        <p>On the example of -learning, we formally describe the adaptions made in KGRL. The
-function update is only modified with an additional input 
+1(, , ) := (, , )
+ 
︂(
 +  max (′, ′, ′) − (′, , ′) ,
︂)
. We can further decompose F : :  → X into:
where  is the current state and  := F() are the KG based features extracted from state</p>
        <p>F() = R* (E* ()) ,
with E* :  → (ℰ × ℛ</p>
        <p>) denoting an extractor, mapping the state space  into the powerset of
mapping to the product embedding space.
the productspace of entities ℰ and relations ℛ in the KG and R* : (ℰ × ℛ ) → X a retriever,</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. KGRL Components</title>
        <p>We implement our approach as an observation wrapper, called KG wrapper, over the original
environment. The main components of this wrapper are (1) resources, such as the KG and LLMs,
(2) KG linking, (3) KG retrieval, (4) KG embedding of the retrieved subgraph.
wrapper. In the following subsections we describe the KG linking and KG retrieval in more
detail. The process for adapting KGRL to a new domain is described in the Appendix C.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Knowledge Graph Linking</title>
          <p>Entity and Relation Linking in KGRL For this purpose, KGRL uses one of pre-trained SBERT
models 3 tailored to Semantic Textual Similarity task, which facilitates entities and relations
retrieval from a knowledge graph for a specified environment. Specifically, each triple in the
knowledge graph is represented by concatenation of its subject and object labels, and the model
evaluates cosine similarity between triples labels and n-grams from the textual description
of environment (e.g. mission string for Minigrid). Subsequently, a given number of the most
relevant knowledge graph triples provide additional information about the environment in
the form of its embeddings and the confidence scores (cosine similarity metrics) for the agent
to enhance its decision making process (an illustration of this process can be found in the
repository).</p>
          <p>Symbolic KG linking Heterogeneous and environment specific state representations (i.e.,
codes, coordinates) can be linked to a KG via symbolic linking, which represents a character
based matching, agnostic of the underlying semantics. In case of the Minigrid environment
additional state representations are given as arrays corresponding to cells in the grid. We use
this representation to link a cell in the grid to a cell entity in the KG.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Knowledge Graph Retrieval</title>
          <p>In order to make the information in the KG available to the agent we first have to retrieve a
subset of the graph relevant to the current state of the agent. This happens on the basis of the
extractions from the original observation. The extractions are fed into the retrieval methods
which return relevant subsets of the KG as embedding vectors. This additional information
allows the agent to navigate in the latent high dimensional Knowledge Graph embedding space
in order to complete tasks. This is a crucial part in the pipeline as the underlying knowledge
base can be huge and passing the entire KG might be unfeasible. Below we introduce two
methods for retrieval:</p>
          <p>k-NN Retrieval The k-nearest neighbors of the extracted entities in the KGE space are
determined including the extracted entities. This method returns embeddings of semantically
similar entities, as well as entities that are connected to the extracted entities by a short path.
As the computation of a k-NN index can be done in advance, this method scales well in the
number of steps.</p>
          <p>Random walk Retrieval This method retrieves random walks of fixed length, starting in
the extracted entities, consisting of embeddings of encountered entities and relations. This is
computationally slightly more expensive than the k-NN look-up, but has the advantage allowing
a deeper exploration of the latent KGE space due to the inherent stochasticity of this process.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <sec id="sec-5-1">
        <title>5.1. Experimental Setup</title>
        <p>To evaluate our hypotheses, we perform a wide range of experiments on environments of
diferent complexities. We run the training cycles for the baseline as well as for diferent
configurations of our proposed method. For each environment, we conduct a hyperparameter search
prior to training in order to find decent training parameters for the baseline algorithm. This
ensures a possibility for convergence for the baseline. We do not perform further
hyperparameter search for our proposed modifications. Further improvements are likely possible by
searching for the best knowledge graph embedding model for a given domain, as well as tuning
the dimension and the size of the retrieved subset of the KG.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Running KGRL on the MiniGrid Environment</title>
        <p>MiniGrid Ontology For the Minigrid environment we design an OWL ontology, which
completely describes Minigrid environments from the perspective of available tasks. We describe
artifacts, states and properties (i.e. colour and location) and agent’s actions and the relations
between them. The visualized ontology and OWL/TTL files can be found in the repository and
Appendix D.1.</p>
        <p>For example, the ontology describes, which doors (class Door) the room (class Room) has, in
which cell (class Cell) it is located, what its state is (subclasses Opened, Locked or Closed of class
state) and what color it is assigned to. On the other hand, a color can also be assigned to keys
(class Key) creating a short connection between doors and keys they can be opened with.</p>
        <p>Knowledge Graph Generation from Environment Representation Based on the above
ontology we automatically generate a KG from the environment representation. The
representation is given as an array of size grid size × 3 giving for each cell its content, the state of
the content and its color. The generated KG describes the environment completely in terms of
positions, states and interactions of objects. We use this KG as an exterior knowledge that the
agent can explore in parallel to the environment.</p>
        <p>
          Extraction and Retrieval For retrieving from the KG we associate the grid cell, where the
agent is currently located, with the corresponding cell entity in the KG and retrieve a relevant
subset of the entire KG with the methods described in 4.2.2 starting from this entity. Furthermore,
as this environment returns a static mission string (see table A), we also augment our approach
= in
with a transformed mission, which consists of the most relevant triples (ℎ, , ))=0
=, where ℎ,  ∈ ℰ are entities found in
the KG, along with their confidence scores ()=0
the mission string. We use the translational embedding model TransE to embed the retrieved
subset as this model showed good performance over diferent applications [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Training Runs and Evaluation</title>
        <p>We select environments featuring tasks of varying complexity and size to perform our
experiments. In each of these environments we perform a training run for ten diferent seeds to verify
our approach on diferent samples of the environments. During training, we evaluate the trained
policy 200 times at regular intervals. The average episode length (i.e. the number of steps to
complete a task) as well as the average rewards achieved in the evaluation episodes are reported.
We additionally compute the average number of training timesteps needed to achieve a given
reward. By averaging over diferent Environments and concentrating on relative timesteps
based on the pointwise best achieved training steps, this allows to draw conclusions about
the average gain in terms of sample eficiency. This procedure is performed for the Baseline
for which we choose DQN (in the following referred to as plain) - and two configurations of
our approach. The first approach knn consists of the inclusion of k-nearest neighbors to the
current state of the agent. The second approach rw augments the observation by including
the knowledge graph embeddings of entities and relations encountered in a random walk of
length k starting from the current state and the third approach rw-ner additionally returns
embeddings of the triples extracted from the mission string in Minigrid.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Results</title>
        <p>The results of our training runs are reported in Figure 3 in the Appendix, along with further
training runs evaluated with perturbed actions. Combining the results across environments in
one complexity class we find that the performance gain increases with the size of the underlying
grid and the complexity of the Task, see Figure 2. This is further highlighted in Figure 3 in
the Appendix, which shows that the average number of training steps to reach the highest
reward is half of that of the baseline without Knowledge Graph injection. For the most complex
environments knowledge injection leads to the highest rewards in a third of the training steps
on average (see also Figure 2).
wFihgeurere2:Realarteivtheesatimmpelseteepfiscineenecdyeodftloeaarcnhiinevgewt hitehbreesstpreecwtatrodrewardwpitehrftohrempalnacine,DQNand/**,
the timesteps needed by the respective approach * to achieve the same reward (higher is better). Bars
represent averages over all environments in the complexity category.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>Our experimental results show, that KGRL method significantly reduces the amount of training
steps needed to achieve high rewards, showing better results for more complex environments,
where decision making by the agent requires more knowledge to adapt. At the same time, we see
that techniques of KGs injection difer in eficiency, depending on the specificity and structure
of KGs. For example, simpler environments are characterized by the presence of topological
information in the KGs, while complex environments include also more domain-specific rules
(e.g., using certain keys for given doors). All of this raises additional questions to explore the
ways to retrieve, encode, and inject knowledge into RL pipelines. We believe that this efect
confirms our hypothesis, that KG injection in RL pipelines at scale improves decision making
in complex domains. In future work we will explore how the quality and completeness of the
used KG influences training parameters, testing also if RL can be used to evaluate the quality of
ontology creation approaches and fact extraction. Another promising research direction lies
in applying KGRL to other domains, and experimenting with more elaborated approaches for
integrating knowledge graphs with LLMs in a single pipeline for better alignment.</p>
    </sec>
    <sec id="sec-7">
      <title>Appendix A</title>
    </sec>
    <sec id="sec-8">
      <title>Minigrid Environment</title>
      <p>
        Minigrid [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], formerly known as Gym-Minigrid, is a grid world environment with a variety of
configurations and tasks. A sparse reward of 1 minus a small value for each interaction step
with the environment is obtained once the given task has been completed. The interaction
penalty is chosen in such a way that the reward is positive upon completion. The default
observation returned by the Minigrid environments consists of a the restricted view of the agent,
the direction he is facing and a mission string, which does not change once the environment is
initialized. In the following paragraphs we shortly describe a selection of the environments
within Minigrid.
      </p>
      <p>We divide the environments into three classes (simple, medium and high) according to
its complexity, estimated via properties of the knowledge graph, which was generated for
that environment, such as number of nodes, entities and relations, etc. The distribution of
environments into complexity classes is shown in the Appendix A.1.</p>
      <p>Empty and Four Rooms are simple environments without additional objects in the grid and
with the task to navigate to a goal cell. MultiRoom environment also contain doors that have to
be opened by the agent by the agent between a flexible number of rooms in addition Door Key
environments also contain a key and locked doors. Here the agent first has to collect the key in
order to open the door that separates him from the goal. Lava Gap as well as Lava Crossing
environments require the agent to navigate to a goal while avoiding the lava, which on entering
would terminate the episode with zero reward. In addition the Key Corridor, Obstructed Maze
and Blocked Unlock Pickup environments include further objects the agent has to interact with
to complete the task. For instance there are balls and boxes that can be moved to unblock a
connection or boxes that contain other objects.</p>
      <p>The minigrid Environments come with a Mission string describing the Task, see Table A
Environment
Empty
DoorKey
Lava Gap
Lava Crossing
Key Corridor
Obstructed Maze</p>
      <p>Mission String
get to the green goal square
use the key to open the door and then get to the goal
avoid the lava and get to the green goal square
pick up {color} {object}
pick up the blue ball
Table 2 describes the distribution of environments across 3 complexity classes (easy, medium
and hard) according to the complexity of the knowledge graph, describing this environment.
Lava Gap s5
Lava Gap s7
Door Key 5x5
Door Key 8x8
Lava Crossing
Key Corridor s3r2
Obstructed Maze D1</p>
      <p>N, nodes</p>
      <p>N, edges</p>
      <p>N, artifacts</p>
      <p>We aggregate the training results for diferent task</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Baumli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baveja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Behbahani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhoopchand</surname>
          </string-name>
          , N. BradleySchmieg,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Clay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Collister</surname>
          </string-name>
          , et al.,
          <article-title>Human-timescale adaptation in an open-ended task space</article-title>
          ,
          <source>arXiv preprint arXiv:2301.07608</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wierstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <article-title>Playing atari with deep reinforcement learning</article-title>
          ,
          <source>arXiv preprint arXiv:1312.5602</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Maddison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Den Driessche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Panneershelvam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanctot</surname>
          </string-name>
          , et al.,
          <article-title>Mastering the game of go with deep neural networks and tree search</article-title>
          ,
          <source>nature</source>
          <volume>529</volume>
          (
          <year>2016</year>
          )
          <fpage>484</fpage>
          -
          <lpage>489</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Yu, Alphazero, in: Deep Reinforcement Learning, Springer,
          <year>2020</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Muralidhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marwah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpatne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <article-title>Incorporating prior domain knowledge into deep neural networks</article-title>
          ,
          <source>in: 2018 IEEE international conference on big data (big data)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>K-bert: Enabling language representation with knowledge graph</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>2901</fpage>
          -
          <lpage>2908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lauscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Majewska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          , I. Gurevych,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rozanov</surname>
          </string-name>
          , G. Glavaš,
          <article-title>Common sense or world knowledge? investigating adapter-based knowledge injection into pretrained transformers</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>11787</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          , T. Brown, M. Martic,
          <string-name>
            <given-names>S.</given-names>
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning from human preferences</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bellman</surname>
          </string-name>
          ,
          <article-title>Dynamic programming</article-title>
          ,
          <source>Technical Report, RAND CORP SANTA MONICA CA</source>
          ,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chevalier-Boisvert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Willems</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          , Minimalistic gridworld environment for gymnasium,
          <year>2018</year>
          . URL: https://github.com/Farama-Foundation/Minigrid.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Watkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dayan</surname>
          </string-name>
          ,
          <article-title>Q-learning</article-title>
          ,
          <source>Machine learning 8</source>
          (
          <year>1992</year>
          )
          <fpage>279</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Gruber</surname>
          </string-name>
          ,
          <article-title>A translation approach to portable ontology specifications</article-title>
          ,
          <source>Knowledge acquisition 5</source>
          (
          <year>1993</year>
          )
          <fpage>199</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W. O. W.</given-names>
            <surname>Grou</surname>
          </string-name>
          ,
          <article-title>Owl 2 web ontology language document overview (second edition</article-title>
          ),
          <year>2012</year>
          . URL: https://www.w3.org/TR/owl2-overview/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          , et al.,
          <source>Ontology development 101</source>
          ,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>Systems Laboratory</given-names>
          </string-name>
          , Stanford University 2001 (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Hoyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vermue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharifzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Bringing light into the dark: A large-scale evaluation of knowledge graph embedding models under a unified framework</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Colon-Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Havasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huggins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breazeal</surname>
          </string-name>
          ,
          <article-title>Combining pre-trained language models and structured knowledge</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2101</volume>
          .
          <fpage>12294</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>29</volume>
          (
          <year>2017</year>
          )
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Juan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <article-title>Low-dimensional hyperbolic knowledge graph embeddings</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>00545</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Earl: joint entity and relation linking for question answering over knowledge graphs</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2018</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Titov</surname>
          </string-name>
          ,
          <article-title>Improving entity linking by modeling latent relations between mentions</article-title>
          , arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>10637</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph and text jointly embedding</article-title>
          ,
          <source>in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1591</fpage>
          -
          <lpage>1601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          , et al.,
          <article-title>Integrating graph contextualized knowledge into pre-trained language models</article-title>
          , arXiv preprint arXiv:
          <year>1912</year>
          .
          <volume>00147</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Niepert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Onoro-Rubio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Rosenblum</surname>
          </string-name>
          ,
          <article-title>Mmkg: multimodal knowledge graphs</article-title>
          ,
          <source>in: The Semantic Web: 16th International Conference, ESWC</source>
          <year>2019</year>
          , Portorož, Slovenia, June 2-6,
          <year>2019</year>
          , Proceedings 16, Springer,
          <year>2019</year>
          , pp.
          <fpage>459</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Tuán</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-K. Tran</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          <string-name>
            <surname>Phuoc</surname>
          </string-name>
          , Visionkg:
          <article-title>Towards a unified vision knowledge graph</article-title>
          , in: O.
          <string-name>
            <surname>Seneviratne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
          </string-name>
          , L. Etcheverry (Eds.),
          <source>Proceedings of the ISWC 2021 Posters</source>
          ,
          <article-title>Demos and Industry Tracks: From Novel Ideas to Industrial Practice (ISWC</article-title>
          <year>2021</year>
          ),
          <source>October 24-28</source>
          ,
          <year>2021</year>
          , volume
          <volume>2980</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2980</volume>
          /paper362.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ammanabrolu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Playing text-adventure games with graph-based deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the NAACL: HLT, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3557</fpage>
          -
          <lpage>3565</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1358.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Humphreys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tieleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <article-title>Large-scale retrieval for reinforcement learning</article-title>
          ,
          <source>arXiv preprint arXiv:2206.05314</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kudenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <article-title>Graph learning based generation of abstractions for reinforcement learning</article-title>
          .,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kudenko</surname>
          </string-name>
          ,
          <article-title>Using uniform state abstractions for reward shaping with reinforcement learning</article-title>
          ,
          <source>in: Workshop on Adaptive Learning Agents (ALA) at the Federated AI Meeting</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>K.</given-names>
            <surname>Murugesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kapanipathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Talamadupula</surname>
          </string-name>
          ,
          <article-title>Enhancing text-based reinforcement learning agents with commonsense knowledge, arXiv preprint</article-title>
          arXiv:
          <year>2005</year>
          .
          <volume>00811</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Interactive recommender system via knowledge graph-enhanced reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>