<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Cognitive AI</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Cognitive Architecture for Integrated Robot Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohan Sridharan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intelligent Robotics Lab, School of Computer Science, University of Birmingham</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>13</volume>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes an integrated architecture for robots that combines knowledge-based and datadriven methods for transparent reasoning, control, and learning. Specifically, the architecture builds on the principle of step-wise iterative refinement to support non-monotonic logical reasoning and probabilistic reasoning with tightly-coupled transition diagrams of the domain at diferent resolutions. Reasoning with prior domain knowledge and heuristic methods guide the interactive learning and revision of knowledge in the form of axioms governing change, predictive models controlling the robot's movement, and predictive models of the behavior of other agents. Furthermore, the interplay between these components is used to embed the principles of explainable agency, enabling a robot to provide ondemand relational descriptions of its decisions and beliefs in response to diferent types of questions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>Coarser−resolution</p>
      <p>Representation
(Resolution 1)
(Logician) Representation
(ExILpneltoearrreanrci)ntigve tianrbatsentnsrtiatioicotnnal (Resolution i)
observed
outcomes</p>
      <p>Representation
(Statistician) (Resolution i+1)
Finer−resolution</p>
      <p>Representation
(Resolution N)</p>
      <p>Commonsense
knowledge, theories
of cognition, learning</p>
      <p>Non−monotonic
Logical reasoning</p>
      <p>Probabilistic</p>
      <p>Execution
Probabilistic models
of uncertainty</p>
      <p>Text/Audio
processing
Raliextlieeorvmaalnss,t Ptreoxctessed</p>
      <p>Program
analyzer
Labels
(training phase)</p>
      <p>Features
extraction
Deincdisuiocntiotnree Current state</p>
      <p>New axioms</p>
      <p>Answer set
Classification
block</p>
      <p>ASP
program</p>
      <p>Real scenes</p>
      <p>Baxter
Plan</p>
      <p>Goal
Answer set,
domain
knowledge
Outputs:</p>
      <p>Output labels
(occlusion, stability)</p>
      <p>
        Explanations
(relational description)
language are used to describe these diagrams in the form of a sorted signature with statics,
lfuents, and actions; and three types of (deterministic, non-deterministic) axioms governing
domain dynamics: causal laws, state constraints, and executability conditions. The domain’s
history includes the robot’s observations, action executions, and prioritized defaults in the
initial state. For any given task, the robot plans and executes actions at two resolutions, but is
able to construct on-demand relational descriptions of decisions at other resolutions.
Knowledge representation and reasoning: The prior domain knowledge that the robot
represents (as relational statements) and reasons with in the coarse resolution includes cognitive
theories. For example, in addition to reasoning about the attributes and default room location
of objects, a robot in an ofice building also considers an adaptive theory of intentions encoding
principles of non-procrastination and persistence to respond quickly to unexpected successes and
failures. The fine-resolution transition diagram is defined as a refinement of the coarse-resolution
diagram, with a theory of observations modeling the robot’s ability to sense the values of domain
lfuents. A robot in an ofice building now considers grid cells in rooms and object parts, attributes
that were previously abstracted away, and reasons about knowledge fluents whose values are
changed by observation actions. The definition of refinement guarantees that for any given
coarse-resolution transition, there exists a path in the fine-resolution diagram between states that
are refinements of the coarse-resolution states. Also, the refined diagram is randomized to model
non-determinism. For any given goal, a plan of intentional abstract actions is obtained at the
coarse-resolution through non-monotonic logical reasoning by translating the action language
description to a Answer Set Programming program and solving it [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The robot implements
each abstract transition as a sequence of concrete actions by automatically zooming to and
reasoning with the relevant part of the fine-resolution diagram. The fine-resolution reasoning
and execution uses probabilistic models of uncertainty (e.g., in perception and actuation) and
relevant methods, adding outcomes to coarse-resolution history for subsequent reasoning [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
Interactive learning, control, and transparency: It is often dificult to use state of the art
machine learning methods (e.g., based on deep networks) to revise the robot’s knowledge over
time. These methods require many training examples and considerable computational resources
that are not available in many robot domains. Our architecture supports three strategies for
incremental, eficient acquisition of previously unknown action capabilities and axioms: (i)
verbal descriptions of observed behavior; (ii) active exploration of new transitions; and (iii)
reactive exploration of unexpected transitions. These strategies are formulated as interactive
(e.g., inductive, reinforcement) learning problems. Reasoning and learning guide each other,
enabling the robot to automatically identify and use the relevant information to construct
mathematical models for these formulations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For example, to estimate the stability of objects
in a scene, the robot first attempts to reason with domain knowledge and spatial relations
extracted from input images. Relevant regions of interest are automatically extracted from
images for which reasoning is unable to make a decision (or makes an incorrect decision), and
used to train a data-driven model (e.g., a deep network) for stability estimation. Information
from these regions also induces axioms used for subsequent reasoning—Figure 1(right) provides
an overview of this architecture. This approach substantially improves reliability and eficiency
in comparison with data-driven models [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. Our architecture supports a similar approach
to address the discontinuous interaction dynamics experienced by a robot making and breaking
contacts with objects and surfaces (e.g., while cleaning a table). The robot learns from a few trials
to predict contact regions and end-efector measurements, using the error between prediction
and measurements to adapt control laws in order to ensure smooth motion [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        Our architecture supports explainable agency, i.e., transparent reasoning and learning that
makes contact with human concepts such as goals and beliefs. It encodes a theory of
explanations comprising: (i) claims about representing, reasoning with, and learning knowledge to
support relational descriptions of decisions; (ii) a characterization of explanations based on
representational abstraction, and explanation specificity and verbosity; and (iii) a methodology
for constructing such explanations. This theory is implemented in conjunction with the
components summarized above—see Figure 1(right). The robot then provides on-demand relational
descriptions of decisions and beliefs in response to diferent types of questions (e.g.,
descriptive, contrastive, counterfactual) posed by a human. The human is able to interactively obtain
descriptions at the desired abstraction, specificity, and verbosity, with the robot automatically
constructing and posing disambiguation questions to the human as needed [
        <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
        ].
Ad hoc teamwork: The final component of our architecture enables collaboration without
prior coordination, known as ad hoc teamwork (AHT), with the ad hoc robot (agent) selecting
and executing actions to collaborate with teammates it has not worked with before. This robot
performs non-monotonic logical reasoning with prior commonsense domain knowledge and
predictive models of the behavior of the other agents (i.e., teammates and opponents). Our
architecture encodes the principle of ecological rationality, which builds on the principle of
bounded rationality and focuses on using heuristic methods for adaptive satisficing in decision
making [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For example, the models predicting the behavior of other agents in benchmark
multiagent collaboration domains are learned and revised rapidly using an ensemble of fast
and frugal trees, with the performance of the team being better than (or comparable with) that
provided by state of the art deep network methods that require orders of magnitude more
training examples and computational resources [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Execution Traces and Results</title>
      <p>The following execution traces demonstrate some capabilities of our architecture.</p>
      <sec id="sec-2-1">
        <title>Execution Example 1. [Planning and learning]</title>
        <p>The robot in the  is asked to bring a cup to the , i.e., the goal state contains:
(, ),  _ℎ(1, ), where  is a .</p>
        <p>• The computed plan of abstract actions is:
(1, ℎ), (1, ),
(1, ), (1, )
which uses the default knowledge that cups are usually in the ℎ next to the .
• To implement each abstract transition, the robot zooms to the relevant fine-resolution
knowledge, e.g., only cells in the  and ℎ are relevant to the first  action.
• The zoomed description is used to obtain a probabilistic policy that is invoked repeatedly
to execute a sequence of concrete actions that implements the abstract action, e.g., robot
is in a cell in the ℎ after first . Other actions are executed in a similar manner.
• The robot’s attempt to pick up a cup in the kitchen fails. The robot observes that the cup
is ℎ while its arm is ℎ. It then learns the executability condition:
impossible (1, ) if (1, ℎ),
_ℎ(, ℎ)</p>
        <p>Any such learned axiom is merged with the existing knowledge.
• The robot also provides on-demand explanations at a suitable level of abstraction.</p>
        <p>Human: “Please describe the executed plan in detail.”
Robot: “I moved to cell 2 in the ℎ. I picked the large cup by its handle from the
counter [...] I moved to cell 4 of the . I put the cup down on the red table.”</p>
      </sec>
      <sec id="sec-2-2">
        <title>Execution Example 2. [Explain plans and beliefs]</title>
        <p>In the scene in the first two images of Figure 2, the robot’s goal is to achieve a state in which
the red cube is on top of the orange cube. The following interaction takes place after the robot
executes a plan to achieve the goal.</p>
        <p>• Human: "Please describe the plan."</p>
        <p>Baxter: "I picked up the blue cube. I put the blue cube on the table. I picked up the
orange cube. I put the orange cube on the table. I picked up the red cube. I put the red
cube on the orange cube."
• The robot may have to justify a particular action.</p>
        <p>Human: "Why did you pick up the blue cube at step 0?"
Baxter: "Because I had to pick up the red cube, and it was below the blue cube."
This answer is also used to automatically highlight the relevant image regions that
influenced this answer, providing additional transparency—see Figure 2(right).
• The robot may have to answer a contrastive question.</p>
        <p>Human: "Why did you not put down the orange cube on the blue cube?"
Baxter: "Because the blue cube is small." In the absence of any reference to a particular
time step, the robot considers the single instance (in the executed plan) of putting the
orange cube on another cube or surface. The answer is based on learned default knowledge
that any structure with a large object on a small object is unstable.
• The human may ask the robot to justify beliefs.</p>
        <p>Human: "Why did you believe the red cube was below the blue cube in the initial state?"
Baxter: "Because I observed the red cube below the blue cube in step 0."
• The robot can run mental simulations to answer counterfactual questions.</p>
        <p>Human: “What would happen if the ball is rolled?”</p>
        <p>Baxter: “The structure of blocks would be unstable”.</p>
        <p>
          For more extensive evaluation of our architecture’s capabilities, we also used complex simulation
environments. For example, for evaluating the AHT capability, we explored the Fort Attack (FA)
domain [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and Half Field Ofense (HFO) domain [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]—last two images of Figure 2—benchmarks
for multiagent collaboration. In FA, guards (in green, including one ad hoc agent) had to protect
a fort from attackers (in red). Any episode ended when all members of a team were killed, an
attacker reached the fort, or guards protected the fort for a suficient time period. Each agent
could move in a particular direction or shoot an opponent within a range. In HFO, members of
the ofense team (including one ad hoc agent) had to score a goal against a team of defenders
and one goalkeeper; the game ended when the ofense team scored a goal, a defender gained
possession of the ball, the ball went out of bounds, or a maximum time limit was exceeded. Each
agent could dribble the ball, pass to another agent, or kick the ball toward the goal. We were
able to experimentally demonstrate that our architecture enables the ad hoc agent to: (i) adapt to
diferent teammate and opponent types, and to changes in team composition; (ii) incrementally
learn and revise other agents’ behavioral models from limited examples; (iii) improve team
performance in comparison with a state of the art data-driven method that involved deep
reinforcement learning in graph neural networks; and (iv) generate relational descriptions as
explanations of its decisions and beliefs in response to diferent types of questions.
        </p>
        <p>
          Complete details and experimental results of evaluating our architecture in simulation and
on physical robots are described in relevant papers [
          <xref ref-type="bibr" rid="ref10 ref12 ref13 ref2 ref3 ref4 ref6 ref8">2, 3, 4, 6, 8, 10, 12, 13</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>The architecture described in this paper is the result of research threads pursued in collaboration
with Hasra Dodampegama, Michael Gelfond, Rocio Gomez, Ben Meadows, Tiago Mota, Heather
Riley, Saif Sidhik, Jeremy Wyatt, and Shiqi Zhang. This work was supported in part by the U.S.
Ofice of Naval Research Awards N00014-13-1-0766, N00014-17-1-2434 and N00014-20-1-2390,
the Asian Ofice of Aerospace Research and Development award FA2386-16-1-4071, and the
U.K. Engineering and Physical Sciences Research Council award EP/S032487/1. All conclusions
reported in this paper are those of the author alone.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gebser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaminski</surname>
          </string-name>
          , B. Kaufmann, T. Schaub, Answer Set Solving in Practice,
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          , Morgan Claypool Publishers,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Riley</surname>
          </string-name>
          ,
          <article-title>What do you really want to do? Towards a Theory of Intentions for Human-Robot Collaboration</article-title>
          ,
          <source>Annals of Mathematics and Artificial Intelligence</source>
          ,
          <source>special issue on commonsense reasoning 89</source>
          (
          <year>2021</year>
          )
          <fpage>179</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Wyatt,
          <article-title>REBA: A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>65</volume>
          (
          <year>2019</year>
          )
          <fpage>87</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Meadows</surname>
          </string-name>
          ,
          <article-title>Knowledge Representation and Interactive Learning of Domain Knowledge for Human-Robot Collaboration</article-title>
          ,
          <source>Advances in Cognitive Systems</source>
          <volume>7</volume>
          (
          <year>2018</year>
          )
          <fpage>77</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          , T. Mota,
          <article-title>Towards Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>37</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leonardis</surname>
          </string-name>
          ,
          <article-title>Integrated Commonsense Reasoning and Deep Learning for Transparent Decision Making in Robotics</article-title>
          , Springer Nature CS
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Riley</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sridharan, Integrating Non-monotonic Logical Reasoning and Inductive Learning With Deep Learning for Explainable Visual Question Answering, Frontiers in Robotics and AI, special issue on Combining Symbolic Reasoning and Data-Driven Learning for Decision-Making 6 (</article-title>
          <year>2019</year>
          )
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sidhik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ruiken</surname>
          </string-name>
          ,
          <article-title>Towards a Framework for Changing-Contact Manipulation Tasks</article-title>
          ,
          <source>in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mathew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sidhik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Azad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wyatt</surname>
          </string-name>
          ,
          <article-title>Online Learning of FeedForward Models for Task-Space Variable Impedance Control</article-title>
          ,
          <source>in: IEEE-RAS International Conference on Humanoid Robotics</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Meadows</surname>
          </string-name>
          ,
          <article-title>Towards a Theory of Explanations for Human-Robot Collaboration</article-title>
          ,
          <source>Kunstliche Intelligenz</source>
          <volume>33</volume>
          (
          <year>2019</year>
          )
          <fpage>331</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gigerenzer</surname>
          </string-name>
          , What is Bounded Rationality?, in: Routledge Handbook of Bounded Rationality, Routledge,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dodampegama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <article-title>Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>696</fpage>
          -
          <lpage>714</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dodampegama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          , Back to the Future:
          <article-title>Toward a Hybrid Architecture for Ad Hoc Teamwork</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence</source>
          , Washington DC, USA,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sycara</surname>
          </string-name>
          ,
          <article-title>Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams</article-title>
          ,
          <source>Technical Report</source>
          , https://arxiv.org/abs/
          <year>2007</year>
          .03102,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausknecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mupparaju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kalyanakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <article-title>Half field ofense: An environment for multiagent learning and ad hoc teamwork</article-title>
          ,
          <source>in: AAMAS Adaptive Learning Agents Workshop</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>