A Cognitive Architecture for Integrated Robot
                                Systems
                                Mohan Sridharan1
                                1
                                    Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK


                                                                         Abstract
                                                                         This paper describes an integrated architecture for robots that combines knowledge-based and data-
                                                                         driven methods for transparent reasoning, control, and learning. Specifically, the architecture builds on
                                                                         the principle of step-wise iterative refinement to support non-monotonic logical reasoning and proba-
                                                                         bilistic reasoning with tightly-coupled transition diagrams of the domain at different resolutions. Rea-
                                                                         soning with prior domain knowledge and heuristic methods guide the interactive learning and revision
                                                                         of knowledge in the form of axioms governing change, predictive models controlling the robot’s move-
                                                                         ment, and predictive models of the behavior of other agents. Furthermore, the interplay between these
                                                                         components is used to embed the principles of explainable agency, enabling a robot to provide on-
                                                                         demand relational descriptions of its decisions and beliefs in response to different types of questions.


                                1. Motivation
                                Consider a robot delivering objects or stacking objects in desired configurations. Such robots
                                have to reason with different descriptions of prior domain knowledge and uncertainty. These
                                descriptions include commonsense knowledge, e.g., relations between some domain objects
                                and default statements such as “textbooks are usually in the library” that hold true in all but a
                                few exceptional circumstances. Also, information extracted from noisy sensor inputs is often
                                associated with quantitative measures of uncertainty, e.g., “I am 90% certain the robotics book is
                                in the office”. In addition, the robot will have to revise its theory of actions and change over time,
                                often using data-driven methods and noisy observations. Furthermore, for effective collaboration
                                with other agents (e.g., humans, robots), the robot will need to reason with incrementally-revised
                                models of the behavior of these agents, and provide on-demand descriptions of its decisions
                                such that they make contact with human-level concepts such as goals and beliefs. In state of the
                                art architectures that combine knowledge-based reasoning (e.g., for planning) and data-driven
                                learning (e.g., for object recognition) for such integrated robot systems, the desired behavior
                                thus poses open problems in knowledge representation, reasoning, control, and learning. This
                                paper summarizes the capabilities of an architecture designed to address these problems.

                                2. Architecture and Insights
                                Figure 1(left) is an overview of the architecture that encodes the principle of stepwise iterative
                                refinement. It is based on tightly-coupled transition diagrams at different resolutions, and may
                                be viewed as a logician, statistician, and an explorer working together. Statements in an action

                                Cognitive AI 2023, 13th-15th November, 2023, Bari, Italy.
                                  m.sridharan@bham.ac.uk (M. Sridharan)
                                { https://www.cs.bham.ac.uk/~sridharm/ (M. Sridharan)
                                 0000-0001-9922-8969 (M. Sridharan)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                                       Inputs:         Simulated scenes                                Human query
                                                                                                                                        Real scenes
                                                                                        Labels
    Coarser−resolution                Representation                                   (training phase)
                                      (Resolution 1)                                                                    Features
                                                                                                                                            Baxter
                                                                  Commonsense                                          extraction
                                                               knowledge, theories
                                                                                                  Decision tree   Current state             Plan
                                                              of cognition, learning               induction
                                                                                                                                                          Text/Audio
                      (Logician)      Representation              Non−monotonic                                                     ASP
                                                                                                                                             Goal         processing
                                                                                                          New axioms
                                      (Resolution i)             Logical reasoning                                                program
     (Explorer)         abstract                                                                                                                      Relevant    Processed
                                                                                                                                                      axioms,
                        intentional                observed                                                   Answer set                               literals
                                                                                                                                                                  text
        Interactive     transition                 outcomes
         Learning                                                                                                                       Answer set,       Program
                                                                                                          Classification                 domain           analyzer
                                      Representation                Probabilistic                             block                      knowledge
                  (Statistician)      (Resolution i+1)               Execution
                                                              Probabilistic models
                                                                  of uncertainty

                                      Representation                                   Outputs:           Output labels
                                                                                                                                                        Explanations
    Finer−resolution                  (Resolution N)                                                  (occlusion, stability)                       (relational description)


Figure 1: Architecture represents and reasons with transition diagrams at different resolutions, com-
bining strengths of declarative programming, probabilistic reasoning, and interactive learning.
language are used to describe these diagrams in the form of a sorted signature with statics,
fluents, and actions; and three types of (deterministic, non-deterministic) axioms governing
domain dynamics: causal laws, state constraints, and executability conditions. The domain’s
history includes the robot’s observations, action executions, and prioritized defaults in the
initial state. For any given task, the robot plans and executes actions at two resolutions, but is
able to construct on-demand relational descriptions of decisions at other resolutions.
Knowledge representation and reasoning: The prior domain knowledge that the robot
represents (as relational statements) and reasons with in the coarse resolution includes cognitive
theories. For example, in addition to reasoning about the attributes and default room location
of objects, a robot in an office building also considers an adaptive theory of intentions encoding
principles of non-procrastination and persistence to respond quickly to unexpected successes and
failures. The fine-resolution transition diagram is defined as a refinement of the coarse-resolution
diagram, with a theory of observations modeling the robot’s ability to sense the values of domain
fluents. A robot in an office building now considers grid cells in rooms and object parts, attributes
that were previously abstracted away, and reasons about knowledge fluents whose values are
changed by observation actions. The definition of refinement guarantees that for any given
coarse-resolution transition, there exists a path in the fine-resolution diagram between states that
are refinements of the coarse-resolution states. Also, the refined diagram is randomized to model
non-determinism. For any given goal, a plan of intentional abstract actions is obtained at the
coarse-resolution through non-monotonic logical reasoning by translating the action language
description to a Answer Set Programming program and solving it [1]. The robot implements
each abstract transition as a sequence of concrete actions by automatically zooming to and
reasoning with the relevant part of the fine-resolution diagram. The fine-resolution reasoning
and execution uses probabilistic models of uncertainty (e.g., in perception and actuation) and
relevant methods, adding outcomes to coarse-resolution history for subsequent reasoning [2, 3].
Interactive learning, control, and transparency: It is often difficult to use state of the art
machine learning methods (e.g., based on deep networks) to revise the robot’s knowledge over
time. These methods require many training examples and considerable computational resources
that are not available in many robot domains. Our architecture supports three strategies for
incremental, efficient acquisition of previously unknown action capabilities and axioms: (i)
verbal descriptions of observed behavior; (ii) active exploration of new transitions; and (iii)
reactive exploration of unexpected transitions. These strategies are formulated as interactive
(e.g., inductive, reinforcement) learning problems. Reasoning and learning guide each other,
enabling the robot to automatically identify and use the relevant information to construct
mathematical models for these formulations [4]. For example, to estimate the stability of objects
in a scene, the robot first attempts to reason with domain knowledge and spatial relations
extracted from input images. Relevant regions of interest are automatically extracted from
images for which reasoning is unable to make a decision (or makes an incorrect decision), and
used to train a data-driven model (e.g., a deep network) for stability estimation. Information
from these regions also induces axioms used for subsequent reasoning—Figure 1(right) provides
an overview of this architecture. This approach substantially improves reliability and efficiency
in comparison with data-driven models [5, 6, 7]. Our architecture supports a similar approach
to address the discontinuous interaction dynamics experienced by a robot making and breaking
contacts with objects and surfaces (e.g., while cleaning a table). The robot learns from a few trials
to predict contact regions and end-effector measurements, using the error between prediction
and measurements to adapt control laws in order to ensure smooth motion [8, 9].
   Our architecture supports explainable agency, i.e., transparent reasoning and learning that
makes contact with human concepts such as goals and beliefs. It encodes a theory of explana-
tions comprising: (i) claims about representing, reasoning with, and learning knowledge to
support relational descriptions of decisions; (ii) a characterization of explanations based on
representational abstraction, and explanation specificity and verbosity; and (iii) a methodology
for constructing such explanations. This theory is implemented in conjunction with the compo-
nents summarized above—see Figure 1(right). The robot then provides on-demand relational
descriptions of decisions and beliefs in response to different types of questions (e.g., descrip-
tive, contrastive, counterfactual) posed by a human. The human is able to interactively obtain
descriptions at the desired abstraction, specificity, and verbosity, with the robot automatically
constructing and posing disambiguation questions to the human as needed [6, 10].
Ad hoc teamwork: The final component of our architecture enables collaboration without
prior coordination, known as ad hoc teamwork (AHT), with the ad hoc robot (agent) selecting
and executing actions to collaborate with teammates it has not worked with before. This robot
performs non-monotonic logical reasoning with prior commonsense domain knowledge and
predictive models of the behavior of the other agents (i.e., teammates and opponents). Our
architecture encodes the principle of ecological rationality, which builds on the principle of
bounded rationality and focuses on using heuristic methods for adaptive satisficing in decision
making [11]. For example, the models predicting the behavior of other agents in benchmark
multiagent collaboration domains are learned and revised rapidly using an ensemble of fast
and frugal trees, with the performance of the team being better than (or comparable with) that
provided by state of the art deep network methods that require orders of magnitude more
training examples and computational resources [12, 13].

3. Execution Traces and Results
The following execution traces demonstrate some capabilities of our architecture.
Execution Example 1. [Planning and learning]
The robot in the 𝑠𝑡𝑢𝑑𝑦 is asked to bring a cup to the 𝑠𝑡𝑢𝑑𝑦, i.e., the goal state contains:
𝑙𝑜𝑐(𝐶, 𝑠𝑡𝑢𝑑𝑦), 𝑛𝑜𝑡 𝑖𝑛_ℎ𝑎𝑛𝑑(𝑟𝑜𝑏1 , 𝐶), where 𝐶 is a 𝑐𝑢𝑝.
    • The computed plan of abstract actions is:
                            𝑚𝑜𝑣𝑒(𝑟𝑜𝑏1 , 𝑘𝑖𝑡𝑐ℎ𝑒𝑛), 𝑝𝑖𝑐𝑘𝑢𝑝(𝑟𝑜𝑏1 , 𝐶),
                              𝑚𝑜𝑣𝑒(𝑟𝑜𝑏1 , 𝑠𝑡𝑢𝑑𝑦), 𝑝𝑢𝑡𝑑𝑜𝑤𝑛(𝑟𝑜𝑏1 , 𝐶)
      which uses the default knowledge that cups are usually in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛 next to the 𝑠𝑡𝑢𝑑𝑦.
    • To implement each abstract transition, the robot zooms to the relevant fine-resolution
      knowledge, e.g., only cells in the 𝑠𝑡𝑢𝑑𝑦 and 𝑘𝑖𝑡𝑐ℎ𝑒𝑛 are relevant to the first 𝑚𝑜𝑣𝑒 action.
    • The zoomed description is used to obtain a probabilistic policy that is invoked repeatedly
      to execute a sequence of concrete actions that implements the abstract action, e.g., robot
      is in a cell in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛 after first 𝑚𝑜𝑣𝑒. Other actions are executed in a similar manner.
    • The robot’s attempt to pick up a cup in the kitchen fails. The robot observes that the cup
      is ℎ𝑒𝑎𝑣𝑦 while its arm is 𝑙𝑖𝑔ℎ𝑡. It then learns the executability condition:
                          impossible 𝑝𝑖𝑐𝑘𝑢𝑝(𝑟𝑜𝑏1 , 𝐶) if 𝑎𝑟𝑚(𝑟𝑜𝑏1 , 𝑙𝑖𝑔ℎ𝑡),
                                                        𝑜𝑏𝑗_𝑤𝑒𝑖𝑔ℎ𝑡(𝐶, ℎ𝑒𝑎𝑣𝑦)
      Any such learned axiom is merged with the existing knowledge.
    • The robot also provides on-demand explanations at a suitable level of abstraction.
      Human: “Please describe the executed plan in detail.”
      Robot: “I moved to cell 𝑐2 in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛. I picked the large cup by its handle from the
      counter [...] I moved to cell 𝑐4 of the 𝑠𝑡𝑢𝑑𝑦. I put the cup down on the red table.”

Execution Example 2. [Explain plans and beliefs]
In the scene in the first two images of Figure 2, the robot’s goal is to achieve a state in which
the red cube is on top of the orange cube. The following interaction takes place after the robot
executes a plan to achieve the goal.
    • Human: "Please describe the plan."
      Baxter: "I picked up the blue cube. I put the blue cube on the table. I picked up the
      orange cube. I put the orange cube on the table. I picked up the red cube. I put the red
      cube on the orange cube."
    • The robot may have to justify a particular action.
      Human: "Why did you pick up the blue cube at step 0?"
      Baxter: "Because I had to pick up the red cube, and it was below the blue cube."
      This answer is also used to automatically highlight the relevant image regions that
      influenced this answer, providing additional transparency—see Figure 2(right).
    • The robot may have to answer a contrastive question.
      Human: "Why did you not put down the orange cube on the blue cube?"
      Baxter: "Because the blue cube is small." In the absence of any reference to a particular
      time step, the robot considers the single instance (in the executed plan) of putting the
      orange cube on another cube or surface. The answer is based on learned default knowledge
      that any structure with a large object on a small object is unstable.
Figure 2: Scenario for some robot experiments and screenshots of simulation environments.

    • The human may ask the robot to justify beliefs.
      Human: "Why did you believe the red cube was below the blue cube in the initial state?"
      Baxter: "Because I observed the red cube below the blue cube in step 0."
    • The robot can run mental simulations to answer counterfactual questions.
      Human: “What would happen if the ball is rolled?”
      Baxter: “The structure of blocks would be unstable”.

For more extensive evaluation of our architecture’s capabilities, we also used complex simulation
environments. For example, for evaluating the AHT capability, we explored the Fort Attack (FA)
domain [14] and Half Field Offense (HFO) domain [15]—last two images of Figure 2—benchmarks
for multiagent collaboration. In FA, guards (in green, including one ad hoc agent) had to protect
a fort from attackers (in red). Any episode ended when all members of a team were killed, an
attacker reached the fort, or guards protected the fort for a sufficient time period. Each agent
could move in a particular direction or shoot an opponent within a range. In HFO, members of
the offense team (including one ad hoc agent) had to score a goal against a team of defenders
and one goalkeeper; the game ended when the offense team scored a goal, a defender gained
possession of the ball, the ball went out of bounds, or a maximum time limit was exceeded. Each
agent could dribble the ball, pass to another agent, or kick the ball toward the goal. We were
able to experimentally demonstrate that our architecture enables the ad hoc agent to: (i) adapt to
different teammate and opponent types, and to changes in team composition; (ii) incrementally
learn and revise other agents’ behavioral models from limited examples; (iii) improve team
performance in comparison with a state of the art data-driven method that involved deep
reinforcement learning in graph neural networks; and (iv) generate relational descriptions as
explanations of its decisions and beliefs in response to different types of questions.
   Complete details and experimental results of evaluating our architecture in simulation and
on physical robots are described in relevant papers [2, 3, 4, 6, 8, 10, 12, 13].

Acknowledgments
The architecture described in this paper is the result of research threads pursued in collaboration
with Hasra Dodampegama, Michael Gelfond, Rocio Gomez, Ben Meadows, Tiago Mota, Heather
Riley, Saif Sidhik, Jeremy Wyatt, and Shiqi Zhang. This work was supported in part by the U.S.
Office of Naval Research Awards N00014-13-1-0766, N00014-17-1-2434 and N00014-20-1-2390,
the Asian Office of Aerospace Research and Development award FA2386-16-1-4071, and the
U.K. Engineering and Physical Sciences Research Council award EP/S032487/1. All conclusions
reported in this paper are those of the author alone.
References
 [1] M. Gebser, R. Kaminski, B. Kaufmann, T. Schaub, Answer Set Solving in Practice, Synthesis
     Lectures on Artificial Intelligence and Machine Learning, Morgan Claypool Publishers,
     2012.
 [2] R. Gomez, M. Sridharan, H. Riley, What do you really want to do? Towards a Theory
     of Intentions for Human-Robot Collaboration, Annals of Mathematics and Artificial
     Intelligence, special issue on commonsense reasoning 89 (2021) 179–208.
 [3] M. Sridharan, M. Gelfond, S. Zhang, J. Wyatt, REBA: A Refinement-Based Architecture for
     Knowledge Representation and Reasoning in Robotics, Journal of Artificial Intelligence
     Research 65 (2019) 87–180.
 [4] M. Sridharan, B. Meadows, Knowledge Representation and Interactive Learning of Domain
     Knowledge for Human-Robot Collaboration, Advances in Cognitive Systems 7 (2018)
     77–96.
 [5] M. Sridharan, T. Mota, Towards Combining Commonsense Reasoning and Knowledge
     Acquisition to Guide Deep Learning, Autonomous Agents and Multi-Agent Systems 37
     (2023).
 [6] T. Mota, M. Sridharan, A. Leonardis, Integrated Commonsense Reasoning and Deep
     Learning for Transparent Decision Making in Robotics, Springer Nature CS 2 (2021) 1–18.
 [7] H. Riley, M. Sridharan, Integrating Non-monotonic Logical Reasoning and Inductive
     Learning With Deep Learning for Explainable Visual Question Answering, Frontiers
     in Robotics and AI, special issue on Combining Symbolic Reasoning and Data-Driven
     Learning for Decision-Making 6 (2019) 20.
 [8] S. Sidhik, M. Sridharan, D. Ruiken, Towards a Framework for Changing-Contact Manipu-
     lation Tasks, in: IEEE/RSJ International Conference on Intelligent Robots and Systems
     (IROS), 2021.
 [9] M. Mathew, S. Sidhik, M. Sridharan, M. Azad, A. Hayashi, J. Wyatt, Online Learning of Feed-
     Forward Models for Task-Space Variable Impedance Control, in: IEEE-RAS International
     Conference on Humanoid Robotics, 2019.
[10] M. Sridharan, B. Meadows, Towards a Theory of Explanations for Human-Robot Collabo-
     ration, Kunstliche Intelligenz 33 (2019) 331–342.
[11] G. Gigerenzer, What is Bounded Rationality?, in: Routledge Handbook of Bounded
     Rationality, Routledge, 2020.
[12] H. Dodampegama, M. Sridharan, Knowledge-based Reasoning and Learning under Partial
     Observability in Ad Hoc Teamwork, Theory and Practice of Logic Programming 23 (2023)
     696–714.
[13] H. Dodampegama, M. Sridharan, Back to the Future: Toward a Hybrid Architecture for
     Ad Hoc Teamwork, in: AAAI Conference on Artificial Intelligence, Washington DC, USA,
     2023.
[14] A. Deka, K. Sycara, Natural Emergence of Heterogeneous Strategies in Artificially Intelli-
     gent Competitive Teams, Technical Report, https://arxiv.org/abs/2007.03102, 2020.
[15] M. Hausknecht, P. Mupparaju, S. Subramanian, S. Kalyanakrishnan, P. Stone, Half field
     offense: An environment for multiagent learning and ad hoc teamwork, in: AAMAS
     Adaptive Learning Agents Workshop, 2016.