Cognitive Adequacy: Insights from Developing Robot Architectures⋆ Mohan Sridharan1,*,† 1 Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK Abstract This paper discusses cognitive adequacy for robots collaborating with and assisting humans. We share insights from the development of robot architectures that use knowledge-driven and data-driven methods to jointly address challenges in transparent knowledge representation, reasoning, and learning in robotics. Keywords Non-monotonic logical reasoning, Probabilistic reasoning, Interactive learning, Robotics 1. Motivation Consider a robot delivering objects to particular places or stacking objects in desired configura- tions. Such robots have to reason with different descriptions of incomplete domain knowledge and uncertainty. These descriptions include commonsense knowledge, e.g., relations between some domain objects and default statements such as “textbooks are usually in the library” that hold true in all but a few exceptional circumstances. At the same time, information extracted from noisy sensor inputs is often associated with quantitative measures of uncertainty, e.g., “I am 90% certain I saw the robotics book in the office”. Also, any robot in a practical domain will have to revise its existing knowledge over time, often using data-driven methods. In addition, for effective collaboration with humans, the robot may need to describe (or justify) its decisions and beliefs. In state of the art architectures that combine knowledge-based reasoning (e.g., for planning) and data-driven learning (e.g., for object recognition) for such integrated robot systems, cognitive adequacy, i.e., the ability to support the desired behavior, thus poses open problems in knowledge representation, reasoning, and learning. This paper builds on expertise in designing robot architectures [1, 2, 3] to identify some key underlying principles for cognitive adequacy. 2. Architecture and Insights Figure 1(left) is an overview of the architecture that encodes the principle of stepwise iterative refinement. It is based on tightly-coupled transition diagrams at different resolutions, and may be viewed as a logician, statistician, and an explorer working together. These diagrams are described using an action language, which has a sorted signature with statics, (Boolean, non-Boolean) fluents and actions, and supports (deterministic, non-deterministic) causal laws, state constraints, Workshop on Cognitive Aspects of Knowledge Representation at IJCAI 2022 * Corresponding author. m.sridharan@bham.ac.uk (M. Sridharan) { https://www.cs.bham.ac.uk/~sridharm/ (M. Sridharan)  0000-0001-9922-8969 (M. Sridharan) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Inputs: Simulated scenes Human query Real scenes Labels Coarser−resolution Representation (training phase) (Resolution 1) Features Commonsense extraction Baxter knowledge + theories of cognition, learning Decision tree Current state Plan induction Text/Audio (Logician) Representation Non−monotonic ASP Goal processing New axioms (Resolution i) Logical reasoning program (Explorer) abstract Relevant Processed intentional axioms, observed text transition Answer set literals Interactive outcomes Learning Answer set, Program Classification domain analyzer Representation Probabilistic block knowledge (Statistician) (Resolution i+1) Execution Probabilistic models of uncertainty Representation Outputs: Output labels Explanations Finer−resolution (Resolution N) (occlusion, stability) (relational description) Figure 1: Architecture represents and reasons with transition diagrams at different resolutions, combining strengths of declarative programming, probabilistic reasoning, and interactive learning. and executability conditions. The domain’s history includes observations, action executions, and prioritized defaults. For any given task, the robot plans and executes actions at two resolutions, but constructs on-demand relational descriptions of decisions and beliefs at other resolutions. Knowledge representation and reasoning: With two resolutions, the robot represents and rea- sons with commonsense domain knowledge, including cognitive theories, in the coarse-resolution. For example, a robot fetching objects in an office building reasons about the knowledge it has about some attributes and default room locations of objects. It also has a adaptive theory of inten- tions encoding principles of non-procrastination and persistence. The fine-resolution transition diagram is defined as a refinement of the coarse-resolution transition diagram, introducing a theory of observations that models the robot’s ability to sense the values of domain fluents. A robot in an office building would (for example) now consider grid cells in rooms and object parts, attributes that were previously abstracted away by the designer. The definition of refinement guarantees that for any given coarse-resolution transition, there exists a path in the fine-resolution diagram between states that are refinements of the coarse-resolution states. Also, the refined diagram is randomized to model non-determinism in action outcomes. For any given goal, non-monotonic logical reasoning at the coarse-resolution provides a plan of intentional abstract actions; this is achieved using Answer Set Prolog, a declarative programming paradigm [4]. The robot imple- ments each abstract transition as a sequence of concrete actions by automatically identifying (i.e., zooming to) and reasoning with the relevant part of the fine-resolution diagram. Execution in the fine-resolution uses probabilistic models of the uncertainty (e.g., in perception, actuation), with the outcomes added to the coarse-resolution history for subsequent reasoning [1, 2]. Interactive learning and transparency: Reasoning with incomplete knowledge can result in incorrect or suboptimal outcomes. State of the art machine learning methods (e.g., using deep networks) require many labeled examples and considerable computational resources that are often not available in practical robot domains. The architecture supports three strategies for incremental, efficient acquisition of knowledge of previously unknown action capabilities and axioms: (i) verbal descriptions of observed behavior; (ii) exploration of new transitions; and (iii) reactive exploration of unexpected transitions. These strategies are formulated as suitable interactive (e.g., inductive, reinforcement) learning problems. Reasoning and learning guide each other, enabling the automatic identification and use of only the relevant information to construct mathematical models for the different formulations [5]. For example, to estimate the stability of objects in a scene, the robot first attempts to reason with domain knowledge and information (e.g., object category, spatial relations) extracted from input images. Relevant regions of interest are automatically extracted from images for which reasoning is unable to make a decision (or makes an incorrect decision), and used to train a deep network. Information from these regions is also used to induce axioms used for subsequent reasoning—Figure 1(right). This approach substantially improves reliability and efficiency in comparison with deep network methods [3, 6]. The architecture supports transparent reasoning and learning, i.e., explainable agency, by encoding a theory of explanations comprising: (i) claims about representing, reasoning with, and learning knowledge to support relational descriptions of decisions and beliefs; (ii) a character- ization of explanations based on representational abstraction, and explanation specificity and verbosity; and (iii) a methodology for constructing explanations. This theory is implemented in conjunction with the components summarized above—see Figure 1(right). The robot then provides on-demand relational descriptions of decisions and beliefs in response to different types of questions (e.g., descriptive, contrastive, counterfactual) posed by a human. The human is able to interactively obtain descriptions at the desired abstraction, specificity, and verbosity, with the robot posing disambiguation questions to the human as needed [3, 7]. 3. Execution Trace The following execution traces demonstrate the working of the architecture. Execution Example 1. [Planning and Learning] Consider a robot in a 𝑠𝑡𝑢𝑑𝑦 that is asked to fetch a cup. • The plan of abstract actions: 𝑚𝑜𝑣𝑒(𝑟𝑜𝑏1 , 𝑘𝑖𝑡𝑐ℎ𝑒𝑛), 𝑝𝑖𝑐𝑘𝑢𝑝(𝑟𝑜𝑏1 , 𝐶), 𝑚𝑜𝑣𝑒(𝑟𝑜𝑏1 , 𝑠𝑡𝑢𝑑𝑦), 𝑝𝑢𝑡𝑑𝑜𝑤𝑛(𝑟𝑜𝑏1 , 𝐶), is based on the default knowledge that cups are usually in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛. • For each abstract transition, the relevant (zoomed) fine-resolution description is identified, e.g., only cells in the 𝑠𝑡𝑢𝑑𝑦 and the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛 are relevant to the first 𝑚𝑜𝑣𝑒, and used to obtain a probabilistic policy that is invoked repeatedly to execute a sequence of concrete actions, e.g., robot is in a cell in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛 after first 𝑚𝑜𝑣𝑒. • Suppose the robot’s attempt to pick up a cup in the kitchen fails. Using the knowledge that the cup is ℎ𝑒𝑎𝑣𝑦 and its its arm is 𝑙𝑖𝑔ℎ𝑡, the robot learns the axiom: impossible 𝑝𝑖𝑐𝑘𝑢𝑝(𝑟𝑜𝑏1 , 𝐶) if 𝑎𝑟𝑚(𝑟𝑜𝑏1 , 𝑙𝑖𝑔ℎ𝑡), 𝑜𝑏𝑗_𝑤𝑒𝑖𝑔ℎ𝑡(𝐶, ℎ𝑒𝑎𝑣𝑦), • When asked to provide a detailed description after plan execution, the robot revises the abstraction level to use the fine-resolution description. Human: “Please describe the executed plan in detail.” Robot: “I moved to cell 𝑐2 in the 𝑘𝑖𝑡𝑐ℎ𝑒𝑛. I picked the large cup by its handle from the counter [...] I moved to cell 𝑐4 of the 𝑠𝑡𝑢𝑑𝑦. I put the cup down on the red table.” Execution Example 2. [Explanation and Disambiguation] In the simulated scenario in Figure 2, the human asks the robot to “Move the yellow object on the green cube.”. The reference to yellow object is ambiguous, and the robot asks for clarification. • Robot: “Should I move yellow duck on the green cube?” Human: “No. Move yellow cylinder on the green cube.” • The robot computes a plan: pick up green mug; put green mug on table; pick up red cube; put red cube on table; pick up yellow cube; put yellow cube on table; pick up yellow cylinder; put yellow cylinder on green cube. • The robot traces beliefs and axioms to answer questions after plan execution. Human: “Why did you not pick up red cube at step1?” Robot: “Because the red cube was below the green mug.” Human: “Why did you move yellow cube to the table?” Robot: “I had to put the yellow cylinder on top of the green cube. The green cube was below the yellow cube.” Summary: Implementing principles such as stepwise iterative refine- ment and relevance, and exploiting the interplay between representa- tion, reasoning, and learning, are key steps towards achieving cognitive adequacy in architectures for robots. Such an architecture that com- bines knowledge-based reasoning and data-driven learning has provid- ing promising results in simulation and on physical robots [1, 2, 3, 5]. Figure 2: Example. Acknowledgments This work is the result of research threads pursued in collaboration with Ben Meadows, Tiago Mota, Heather Riley, Rocio Gomez, Michael Gelfond, Shiqi Zhang, and Jeremy Wyatt. This work was supported in part by the U.S. ONR Awards N00014-13-1-0766, N00014-17-1-2434 and N00014-20-1-2390, AOARD award FA2386-16-1-4071, and U.K. EPSRC award EP/S032487/1. References [1] R. Gomez, M. Sridharan, H. Riley, What do you really want to do? Towards a Theory of Intentions for Human-Robot Collaboration, Annals of Mathematics and Artificial Intelligence, special issue on commonsense reasoning 89 (2021) 179–208. [2] M. Sridharan, M. Gelfond, S. Zhang, J. Wyatt, REBA: A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics, Journal of Artificial Intelligence Research 65 (2019) 87–180. [3] T. Mota, M. Sridharan, A. Leonardis, Integrated Commonsense Reasoning and Deep Learning for Transparent Decision Making in Robotics, Springer Nature CS 2 (2021) 1–18. [4] M. Gebser, R. Kaminski, B. Kaufmann, T. Schaub, Answer Set Solving in Practice, Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan Claypool Publishers, 2012. [5] M. Sridharan, B. Meadows, Knowledge Representation and Interactive Learning of Domain Knowledge for Human-Robot Collaboration, Advances in Cognitive Systems 7 (2018) 77–96. [6] H. Riley, M. Sridharan, Integrating Non-monotonic Logical Reasoning and Inductive Learn- ing With Deep Learning for Explainable Visual Question Answering, Frontiers in Robotics and AI, special issue on Combining Symbolic Reasoning and Data-Driven Learning for Decision-Making 6 (2019) 20. [7] M. Sridharan, B. Meadows, Towards a Theory of Explanations for Human-Robot Collabora- tion, Kunstliche Intelligenz 33 (2019) 331–342.