<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collaborative Route Finding in Semantic Mazes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Katharine Beaumont</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eoin O'Neill</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nestor Velasco Bermeo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rem Collier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UCD School of Computer Science, University College Dublin</institution>
          ,
          <addr-line>Belfield, Dublin 4</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The following document describes the submission to the All The Agents Challenge of a system that interacts with the Autonomous Maze Environment Explorer Project (AMEE)1. Our solution is a collaborative route finding service for Semantically Defined Mazes. Agents with Reinforcement Learning tools are used to discover paths through the maze. These agents collaborate by sharing schematic knowledge of the maze (information about the maze layout). The code repository and video demonstration locations are detailed in the Online Resources section at the end of the document. Video: https://youtu.be/b2tecNJc0DE Source Code: https://gitlab.com/mams-ucd/atac-maze</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semantic Web</kwd>
        <kwd>Semantic Agents</kwd>
        <kwd>Intelligent Agents</kwd>
        <kwd>Reinforcement Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A challenge of semantic web technologies is the conceptual integration of heterogeneous sources
of data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A possible solution is the creation of ontologies for knowledge representation, but
these can be dificult to maintain and reuse [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Furthermore, there exists a proliferation of
sometimes contradictory ontologies of varying levels of detail [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Over the last two decades, in semantic web research there has been a shift of focus from
ontologies, to linked data, to knowledge graphs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A major challenge in agent-oriented
programming is the complexity of programming agents, and building multi-agent systems.
Once deployed, multi-agent systems can be dificult to amend and adapt to new technologies.
      </p>
      <p>Thus another challenge is how to future proof intelligent web agents. When creating a
multiagent system comprised of intelligent web agents, interacting with semantic web resources,
there is a need to balance the integration of specific ontologies and schemas with agent reuse.</p>
      <p>This paper proposes a starting point to providing this balance: using state of the art web
technologies, we present a system of BDI agents programmed with abstract goals and plans,
combined with modules that provide goal and plan implementations: integrating schemas and
machine learning algorithms, whilst allowing for knowledge sharing between agents. This
modular approach allows for a system of agents that can be augmented with diferent learning
algorithms and/or diferent schemas, with very little change to the agent code, whilst preserving
the abilities of the agents to interact efectively with a semantic web environment.</p>
      <sec id="sec-1-1">
        <title>1.1. Technologies</title>
        <p>
          The system provides an integration of semantic web, reinforcement learning, agent-oriented
software and web technologies. It is a multi-agent microservices system (MAMS) comprised
of Agent-Oriented MicroServices (AOMS)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and Plain-Old MicroServices (POMS). AOMS
are microservices exposed through a well-defined interface modelled as a set of
REpresentational StateTransfer (REST). They are built using MAS technology [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The AOMS is the Maze
Navigation Service as presented in Figure 1.
        </p>
        <p>
          The BDI programming language ASTRA is used to create BDI agents that are reactive and
responsive to their environment, continually receiving environment events and updating their
beliefs, which are incorporated into flexible plans [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. They ofer significant advantages in
developing autonomous systems and allow for the integration of a range of AI techniques [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          Using ASTRA allows us to take advantage of the module mechanism for building internal
libraries that implement custom terms, formulae, sensors, actions and events [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. A single agent
can create several copies of the same module, with diferent names and states [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The modules
are written in Java and accessed via the Module API.
        </p>
        <p>In this system, the aim is to strike a balance between knowledge-rich and knowledge-lean
techniques to explore the environment, by using a module (Jena Module) to interface with
semantic web technologies in order to provide domain knowledge, and a module (Navigator
Module) to process data gained from exploring the environment. The modular composition
avoids a strong commitment to one type of knowledge representation.</p>
        <p>
          Reinforcement learning capabilities are provided in the Navigator Module. The module
contains an implementation of the Q Learning algorithm. Q-Learning uses Temporal Diference
learning which samples from the environment and performs updates based on estimates that are
revised over multiple episodes: learning is on-line and incremental [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It is policy based: policy
learning can be seen as goal-directed [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It is suitable to dynamic, interactive environments.
Integrated with the agents, there are essentially two tiers of goals: a reinforcement learning
goal which is triggered by user interaction, and agent-level goals in the traditional BDI sense.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. System architecture</title>
      <p>The Maze Navigation Service comprises of two BDI agent types: a main managing agent
(Navi), and multiple goal-based path finding subagents (PathFinder). The subagents employ a
KnowledgeStore module that incorporates Apache Jena, and a Navigator Module that provides
reinforcement learning-based navigation capabilities to support the exploration of semantic
mazes. Other modules model relevant ontologies.</p>
      <p>
        Agents should be viewed as BDI agents with a reinforcement learning tool, rather than as
hybrid BDI-reinforcement learning agents [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The programmed goal and plans associated
with BDI architecture are more abstract, and the reinforcement learning module provides
implementation details required to navigate the maze and learn a policy regarding the path
to a goal room. The (BDI) goal of the agents is to respond to user input, by locating a path
to whichever room is required, using the reinforcement learning module. The reinforcement
learning goal is the required room. The (BDI) plans provide iterative interaction with the
environment, according to direction from the reinforcement learning module, which maps it.
      </p>
      <p>The use of modules in the system goes beyond augmenting the agents with additional
knowledge representation technologies as certain behaviours (for example choosing the next
room to enter) are delegated to the module, and are not directly controlled by the agent. The
agent controls the learning cycle, but not the learning process. The modules do not alter the
beliefs, desires or intentions of the agent, but provide a learning tool or resource. The system also
provides for knowledge sharing between agents using Java objects which can be manipulated
in the modules, via the main agent.</p>
      <p>
        A Spring Boot Application (a POMS, the Query Manager) provides a visual interface between
the user and the Maze Navigation Service [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Navi has the url of the Query Manager as an
initial belief, and on startup, it sends a post request to a registration endpoint. When the Query
Manager gets an incoming user request for a room from the User Input, it forwards it to the
registered main agent, which has REST endpoints to receive the request.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Maze Navigation System</title>
        <p>Navi acts as a manager, and creates PathFinder subagents as required. When created, beliefs are
added to Navi about the subagents available and the cumulative schematic knowledge collected
by them. PathFinder subagents use individual copies of key modules that allow them to both
interrogate the RDF schema for the maze, and interact with it.</p>
        <p>There are two types of communication in the system. Inter-agent communication occurs
between Navi and the PathFinder subagents communicate using ASTRA, which implements a
FIPA ACL - based messaging infrastructure. Human-system communication occurs via a REST
API, as described at the end of the last section.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Jena Module (RDF Knowledge Store)</title>
        <p>
          PathFinder subagents have individual in-memory RDF knowledge stores. The Apache Jena
library, an open source framework Semantic Web and Linked Data application[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], was
integrated into the subagents via the Jena module. This enables the subagents to interact with the
Maze Server over HTTP via GET requests. They initialise a Jena model which is an interface
for the creation, parsing and storage of triples.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Collaborative Semantic Navigation Process</title>
        <p>Throughout successive queries, PathFinder subagents collaborate via the agent Navi as they share
schematic knowledge about the maze, which is then communicated with any new subagents.</p>
        <p>On receiving a query, Navi loops through its list of existing subagents and communicates
with each of them in turn to see if they have any knowledge of the room in question. If the
room in question is their goal room, their belief about the path to the room is returned. If the
room in question is a step in their path to a diferent goal room, the subpath is returned.</p>
        <p>If no subagents have knowledge of the requested room, or no subagents exist yet, a new
PathFinder subagent is created. Any schematic knowledge of the maze is shared with the
new subagent by Navi in the LocationStore object. PathFinder has the belief "goal room"
initialised as the requested room. It also sets a goal room variable in the Navigator module. It
has additional beliefs concerning the starting url of the maze, current url, and the current and
maximum number of iterations over which to perform the learning process.</p>
        <p>One iteration of learning involves PathFinder going from the starting room of the maze to
the goal room, or reaching the maximum number of steps (not finding the room). The subagent
controls the length of the learning process through associate beliefs.</p>
        <p>Starting at the maze entry point, the Apache Jena module reads the current url, which informs
it of adjacent rooms. These are passed to the PathFinder subagent, which passes them in turn
to the Navigator module. The Navigator module performs a step of the Q Learning algorithm.
It calculates the score of the move from the previous room into the current one. The score
is calculated using awareness of the objective (arrival at the user-input goal room) instead of
environmental reward signals. Once the goal room is reached, an artificial score is allocated.
This difers from the Q Learning algorithm which takes environmental feedback as the score.</p>
        <p>The Navigator module contains code that decides whether to explore a new room next (pick
randomly) or pick the adjacent room with the highest score (exploit environmental knowledge).
The url of the next room to move to is passed to the PathFinder subagent, which updates it’s
belief on the current url and begins the next step of the current iteration.</p>
        <p>The Navigator module also checks whether the goal room has been reached, or whether the
number of steps taken to try and find the goal room is too high, and ends the process. This
may happen in more complex environments, or when the room does not exist. It signifies the
end of the process to the PathFinder subagent by passing back a url of "end", which resets the
current url belief to the starting url, and increments the number of iterations. Over successive
iterations, the path is refined. Once complete, PathFinder gets the path from the Navigator
module and informs Navi via a message. Navi posts the path (Path) to the Query Manager.
PathFinder sends it’s schematic knowledge (LocationStore) to Navi, to share with any new
PathFinder subagents. If the goal room was not found over all iterations, an error is returned to
Navi, which is posted to the Query Manager.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Knowledge and Learning</title>
      <p>Knowledge and learning in the agents is in memory and not persisted beyond the lifespan of
the program. In addition to objects used in the RDF Knowledge Store, three key objects for
navigation are defined and shared across agents, Location, LocationStore and Path.</p>
      <p>Location represents a room in the maze, storing the room url, name and map of adjacent
locations (direction-room url pairs). It also has a map which represents it’s score in the
reinforcement learning process. The object is created in the subagent’s instance of the Navigator module,
which caches the objects during the learning process. At the end of learning, the Location
objects have their scores removed (as they are individual to subagent goal rooms) and as a
collection are passed from the module to the subagent in the LocationStore object. This is
stored as a belief, and sent to Navi via a message. Navi amalgamates it’s existing copy (if it has
one) with new Location objects, and shares it’s LocationStore with any new PathFinder
subagents. The result is that a URL need only be read once. The Location object thus has a
dual purpose: acting as both global schematic knowledge and facilitating the individual learning
process.</p>
      <p>Path object: At the end of the learning process, the Navigator module returns a Path to the
PathFinder subagent, which stores it as a belief. This is an ordered list of pairs of direction:
room name, starting with the entry point. Printing the list forms a human-readable description
of steps with directions from the entry point to the goal room. This is shared with Navi, and
returned to the user.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Existing work and challenges</title>
      <p>
        The combination of BDI agents with semantic web technology has been broached with JASDL
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This uses an annotation system in a way that is similar to the way our approach uses
ASTRA modules. However this can be seen as a tighter integration, with a strong coupling with
an agent’s beliefs. In the system presented in this paper, the use of modules is more akin to
adding a skill to the agent, that informs but does not integrate with, nor enforce the consistency
of beliefs. It is a more loosely coupled integration. Further, as it does not interact with the
agent’s belief base, it can be used on bigger datasets, and could be extended to use external
databases to further this capability.
      </p>
      <p>
        There is much existing work on the incorporation of learning into BDI agents (for example
detailed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), and in particular reinforcement learning (for example [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
      </p>
      <p>
        A key challenge of the system is the scalability of the reinforcement learning process. It is
met in part by the agents sharing schematic knowledge, which reduces the need to re-map the
maze. In terms of the size of the data, a potential solution would be to introduce a module that
interfaces with a database in place of the LocationStore. Another issue is the complication
of configuring the reinforcement learning process. In addition, reinforcement learning doesn’t
scale well when agent needs to perform diverse set of tasks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These present a challenge if
reusing the agents for diferent environments. A potential solution is the dynamic insertion
of modules at runtime, to explore the semantic environment and preform ontology discovery,
learn which schemas are required, then download and parse them from a schema repository.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research is funded under the SFI Strategic Partnership Pro-gramme (16/SPP/3296) and is
co-funded by Origin Enterprises plc.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>A review of the semantic web field</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <article-title>A new look at the semantic web</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>35</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. O</given-names>
            <surname>'Neill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lillis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>O'Hare, Mams: Multi-agent microservices</article-title>
          ,
          <source>in: Companion Proceedings of The 2019 World Wide Web Conference</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>655</fpage>
          -
          <lpage>662</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Airiau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Padgham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sardina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>Incorporating learning in bdi agents</article-title>
          , in: Workshop AAMAS:
          <article-title>adaptive and learning agents and MAS (ALAMAS+ ALAg)</article-title>
          ,
          <source>ACM Estoril</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Bordini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El Fallah Seghrouchni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hindriks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Logan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <article-title>Agent programming in the cognitive era</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lillis</surname>
          </string-name>
          ,
          <article-title>Reflecting on agent programming with agentspeak (l)</article-title>
          ,
          <source>in: International Conference on Principles and Practice of Multi-Agent Systems</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>351</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Barto</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning: An introduction</article-title>
          , MIT press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <article-title>From programming agents to educating agents-a jason-based framework for integrating learning in the development of cognitive agents</article-title>
          ,
          <source>in: International Workshop on Engineering Multi-Agent Systems</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.-H.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-S.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tapanuj</surname>
          </string-name>
          ,
          <article-title>A hybrid agent architecture integrating desire, intention and reinforcement learning</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>38</volume>
          (
          <year>2011</year>
          )
          <fpage>8477</fpage>
          -
          <lpage>8487</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Inc</surname>
          </string-name>
          , Spring boot,
          <year>2021</year>
          . URL: https://spring.io/projects/spring-boot.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T. A. S.</given-names>
            <surname>Foundation</surname>
          </string-name>
          , Apache jena,
          <year>2021</year>
          . URL: https://jena.apache.org/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Klapiscak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Bordini</surname>
          </string-name>
          ,
          <article-title>Jasdl: A practical programming approach combining agent and semantic web technologies</article-title>
          ,
          <source>in: International Workshop on Declarative Agent Languages and Technologies</source>
          , Springer,
          <year>2008</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Florensa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Held</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <article-title>Automatic goal generation for reinforcement learning agents</article-title>
          , in: J.
          <string-name>
            <surname>Dy</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Krause (Eds.),
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1515</fpage>
          -
          <lpage>1528</lpage>
          . URL: https://proceedings.mlr.press/v80/florensa18a.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>