Towards Safe and Reliable Robot Task Planning Snehasis Banerjee TCS Research & Innovation Tata Consultancy Services, Kolkata, India snehasis.banerjee@tcs.com Abstract Reliability [Dhillon, B.S. 2015] is closely related to AI Safety. Reliability can be defined as how probable is the A long-standing goal of AI is to enable robots to agent to successfully complete a task without failures, while plan in dynamic and uncertain environments by safety is determined by the direct or indirect consequences managing task failure intelligently. Reliability of a of the failures. In case of a semi-autonomous agent like a robot task planner has become essential to ensure telepresence robot, reliability is hampered due to several rea- operational safety. Reliability is generally deter- sons such as random component and sensor failures, human mined by the probability of a task to circumvent errors, systematic hardware failures (degradation over time), failures, while safety is related to the consequences software errors, dynamic environment. Some related works of the failures. In this paper, methods and an archi- like [Axelrod, B. 2018] and [Fisac, J.F. 2018] has shown the tecture for handling reliability and safety in robotic importance of policy level safety aspects in the case of robot task planning is presented. The paper takes help of planning in dynamic obstacle scenarios. However, attaching a telepresence navigation scenario to describe some safety aspects to task planning has remained neglected by lit- candidate solutions such as Task Reliability Graph erature – which is the key highlight of this paper, explained and weighted reliability goals. with the help of a telepresence navigation scenario. AI Task Planning is based on decision making strategies of 1 Introduction action sequences, typically for execution by intelligent and Characteristics of a typical robotic AI system are (a) Adapt- autonomous agents, especially robots. In known environ- ability (b) Autonomy (c) Interaction. Decision making in AI ments with available models or maps, planning can be done systems can be fully autonomic or semi-autonomic in case offline. In dynamically changing unknown environments and of involvement of external entity (say human operators in in open world scenarios, the strategy needs to be revised and telepresence scenario). Recently, ‘AI Safety’ as a research adapted while plan is executing. This requires continuous area has gained prominence due to the gradual infiltration of monitoring of state variables (states of objects in environ- autonomous agents in the normal human environment. Work ment). An example object discreet state variable is door in is ongoing to create an AI Safety Landscape [Espinoza et al. open or closed position. However, a continuous state variable 2019] to nurture a common understanding of current needs, is also possible, say the angle of door being open (in semi challenges and state of the art and field practices relevant closed position). Choice of object state variable depends on to safety in AI. [Amodei et al. 2016] discusses five core domain and how much granularity is needed in a solution to research areas related to AI safety. This paper focuses on accomplish a given goal. Typically, PDDL (Planning Domain two areas: (a) Avoiding Negative Side Effects - ensuring the Definition Language) [Gerevini et al. 2009] and its variants agent actions meet safety constraints (b) Safe Exploration are used to represent the problem, initial conditions, goal state - in dynamic and open environments, how reliable are the and action sequences. This is passed on to a suitable planner strategies to avoid safety hazards. A body of work [Garcıa et (based on semantics used in PDDL) to output a suitable se- al. 2015] [Krakovna et al. 2018] has been done on Machine quence of tasks. To take care of different constraints and se- Learning (ML) based approaches to tackle the above areas. mantic descriptions, a large number of planning methodology However, data driven ML approaches being non-contextual, exists such as metric-based planning, spatial planning, tem- lacks the richness of semantics in decision making. The poral planning, probabilistic planning, belief based planning, paper takes a knowledge-guided AI planning approach planning on open world and open domains. However, while to tackle the aforementioned problems, specifically to a the focus of the planning research community has been to telepresence navigation scenario in uncertain environment. come up with planners with faster computing time and richer semantic processing capability, the aspects of safety and reli- Copyright c 2020 for this paper by its authors. Use ability in tasks as a feature has remain mostly neglected. The permitted under Creative Commons License Attribution 4.0 paper describes an architecture and methodology to tackle the International (CC BY 4.0). safety and reliability aspects as a first step to fill this gap. 2 Applicability in Telepresence Navigation Another limitation a robot may face is range of a sensor: if the hazard or moving obstacle is not in reception coverage of To illustrate the scenarios in which safety and reliability be- a sensor, the robot cannot detect whether the selected trajec- comes essential for robotic task planning, help of telepres- tory is correct. A robot’s visual perception may give only an ence use case is taken here. A telepresence robot (type of imperfect view of clear path as well as objects due to sensor service robot) helps a human user (called operator) to be vir- errors or non-favorable lighting conditions and environmen- tually present at a remote location. Telepresence robots have tal layout. If the telepresence robot is provided with tactile started to gain popularity in schools, corporate offices, hospi- object manipulation capabilities, then the risk becomes more tals, clinics, warehouses and conferences. The default telep- severe. An example is: if in a pantry environment, the robot resence robot comes with a camera and navigation wheels on lits fire to serve a user instruction, without having a gas and top of which more sensors can be added to enhance its percep- smoke detection sensor. Related safety issues are handling tion of the environment. A telepresence robot can operate in fragile objects without care as the robot lacks the knowledge fully guided, semi-autonomous as well as autonomous mode and experience of holding that object. All this lead to task as per application needs and sophistication in software and failures and non-adherence to safety. This demands a dedi- hardware used for that application. However, fully guided cated focus on safety and reliability for robotic task planning. mode is discouraged as if user’s instructions are strictly fol- lowed, the robot may cause harm to itself or others, due to 3 Proposed Architecture and Methodology lack of any restrictions. Hence, the desired modes of oper- ation are either autonomous or semi-guided with restrictions In a human robot interaction (HRI) set up, the User passes (constraints). If a high level instruction is provided by the on a high level task instruction (via text, audio, etc.) that is user say ‘Find where the paper shredder is kept’, the robot captured by the robotic system. The system needs to decide analyses the instruction and decomposes the task into se- an optimal sequence of steps – which will be carried out as quence of most optical sub-tasks to achieve the goal. This actuation in the environment, in order to satisfy the goal of the requires the robot to get a visual idea about object namely ‘pa- user instruction. A generic architecture (Fig. 1) is presented per shredder’ it has to find. Also, if the object is not currently along with significance of each individual module in aiding in visibility, it needs to explore the environment until it finds optimal task execution. Granular details of each module is the goal object. In semi-guided mode, the successful working skipped to keep the focus on safety and reliability aspects. of the robot relies heavily on communication network where a remote user is controlling the robot based on the feedback the user is perceiving from the sensors attached to the robot (such as ego view camera). If there is a communication failure, the semi-guided mode may prove to be disastrous especially if communication fails during executing a sub-task. Commu- nication failure may lead to robots colliding with obstacles due to continuation of trajectory in absence of expected in- terrupt. Also while exploring, if it comes to a zone of con- tinuous moving obstacles (like living beings or other robots), it may go to a stand still (with goal of moving forward) to keep safe distance. Considerable amount of work has been done on collision avoidance algorithms in navigation [Hoy et al. 2015]. PDDL based planning approach [Gayathri et al. 2018] [Ingrand et al. 2017] has been extensively used in Figure 1: Architecture for Reliable & Safe Task Planning robot navigation tasks. However, the algorithm depends on continuous input from sensors that may be faulty and provide erroneous readings or sensed data that eventually leads to col- • Reasoner: This module infers semantic facts combin- lision. Also, there are issues in bridging software and hard- ing apriori knowledge and perceived realtime facts while ware instructions when interacting with physical devices. An operating in the working memory [Mukherjee, D. et example is: if a robot is given a software instruction to rotate al. 2013]. Breaking a given high level instruction into X0 angle, the robots’ physical wheel actually rotates (X+n)0 smaller tasks and overcoming dis-ambiguity is taken angle, thereby making the robot unsafe in precise navigation. care by this module. In uncertain environments, deci- Lack of domain and commonsense knowledge can also have sions can be taken by using standard methodologies in negative consequences in successful task completion. Sup- reasoning under uncertainty [Kern-Isberner et al. 2017]. pose a robot via its camera can see the target object, however • Planner: Planning deals with modeling of the relations the terrain is not safe for navigation (say staircase in case of between perceptions and actions. Planning uses a world wheeled robot), blindly following the instruction will lead the model to estimate deliberation steps as per goal. When robot to fall and break down. Also there can be adversarial problem domain, objects and relevant actions, initial and situations that can fool a robot perception thereby leading to goal states are properly encoded in a semantic format task failure. This can be due to target objects appearing in such as PDDL, a planner can find the best suitable steps mirror, electronic display screen (say television) or portraits. to reach goal state. In dynamic and open worlds with high scope of task failures, dynamic planning becomes Cognitive Engine’s main role is observe–orient–decide–act essential. In dynamic planning, object state variables are (OODA). It’s main jobs are: (a) Handle failures and errors (b) monitored and if a failure state is nearing, re-planning Learn when things go right and wrong, or in other words learn happens from current state condition towards goal. from negative and positive experience in various task execu- • Reinforcement Learning (RL) Module: As the robot tion scenarios (c) Ensuring safety in operations (d) Predicting agent explores the world and carries out sub tasks to likelihood of a task failure or safety risk, given current state. reach its goal, it continuously learns through its per- The following methods shed lights on system objectives. ception capabilities and consequences of actions in the environment. The action and the path (taken by robot) 3.1 Using Safety and Reliability in Metric PDDL goes as a direct feedback as what to do when confronted Assigning reliability and safety to individual task action se- with a similar situation - this scenario is modeled by RL quence in semantic forms like PDDL is straight-forward so- with rewards to successfully reach its goal under con- lution. In the goal objective function, the intricacies of safety straints and penalty (negative reward) in failures. In sim- and reliability can be included. Optimal path search can be ple worlds, if only knowledge based decision making is done by using metric supported planners as follows: used, RL module can be altogether omitted. However, (:metric maximize (total-reliability)) in complex worlds where granularity of knowledge may (:metric maximize (total-safety)) become limited in expression, a hybrid approach by mix- Task actions will have an effect variable like: ing semantic decision making with RL policy based ac- :effect (and ... (increase (total-reliability) R) tuation decisions will prove fruitful. (increase (total-safety) S)) A drawback of this technique is that one is required to know • Knowledge Manager: When dealing with data from var- beforehand all the safety and reliability values for each task, ious sensors it is essential to bring all the information which might not be possible in a cold start scenario. If prob- in a common semantic knowledge format compliant for ability and uncertainty is involved, P-PDDL (probabilistic reasoning and planning. The purpose of this module is PDDL) with unknown states and probabilities will be a more to gather, curate and unify variety of knowledge (like suitable representation. Sensor failures can be modeled based facts, relations, ontologies). It also connects the Seman- on sensor error modeling techniques [Berk, M. et al. 2019]. tic Web via SPARQL endpoints for accessing on demand But task level failures can be learned by actual task execution relevant knowledge missing in current working memory. in real life or via realistic simulations in robot simulators like • Knowledge Store: Knowledge being represented in var- Webots, Gazebo, Gibson, AI2Thor. ious formats, a requirement is to curate the same in blocks. Due to dynamic environments, the knowledge 3.2 Safety Priorities and Vulnerability Values needs to be updated and over-written based on learned In the context of AI Safety, it is often observed that total fo- experience or gathering of fresh unseen knowledge. cus is given to human safety. However, in situations where a • Learned Model: This serves as a store for updating robot agent is interacting with the environment, it is also es- learned models (safety and reliability weights over a task sential to give safety priorities to other living beings, valuable and a task group) from actuation feedback via either re- objects and the robotic agent itself (it also has value). This is inforcement learning or the monitoring module of the achieved by maintaining a semantic database that contains the system or both. Manual entry of initial weights and relative priorities and properties of the entities that will enable range thresholds will make system robust to over fitting. decision making under the specified constraints. Diverse sit- • Perception Module: A robot is equipped with multiple uations and applications will have different priority values. sensors which is required for successful completion of Hence, it is better to have a relative index instead of an abso- a task and its own operation. The perception module is lute one. In case of service robots, the end user should be pro- designated with processing raw sensor readings and con- vided an interface to set custom values, as what humans value verting them usable semantic information for the cogni- most varies from one person to another, which is a technical tive engine. As an example, a camera image is processed challenge. As an example, a simple cup might have less mate- to identify objects and context from robot’s ego view. rial value in terms of money, but if a human memorable event is attached with it, then its relative value becomes higher. A • Actuation Module: Actuation of the robot is based on naive strategy is to keep safe distance from high value entities physical interaction with the environment. This is gen- while actuating. A significant aspect to consider is entities in erally achieved by sending motor movement instructions a environment can also be tagged with degrees of vulnerabil- from the software interaction module of a robot’s base ity. As an example, glass made objects if in navigation path system (usually ROS – Robot Operating System) on top of robot, may prove to be vulnerable if robot somehow devi- of which the cognitive engine is running. ates from its intended route (resulting in collision). However, • Monitoring Module: This is the most important module based on time and emergency constraints, robot may have to (Section 3.5) from the point of safe and reliable task ex- execute tasks that carry some risk of failure and safety relax- ecution. The module oversees (a) monitoring current en- ations. A sample manual entry in tabular format is enlisted: vironment state including robot and objects (b) filtering Class: Robot | Vulnerability - 0.3 | Safety Priority - 0.6 the perception module for any conflicting sensed infor- Class: Glass Vase | Vulnerability - 0.9 | Safety Priority - 0.4 mation (c) providing realtime input to decision making. Class: Human | Vulnerability - 0.6 | Safety Priority -1 3.3 Task Reliability Graph This section describes the concept of modeling and handling task failures in terms of Task Reliability Graph (TRG). Some terminology is defined first: (a) Start/Init: this is initial state of the world (b) Goal: it is the expected state of the world after complete plan is executed. (c) Task Node: subtasks rep- resenting action sequences that has pre-conditions and effects on the world. This can also be thought as action recipes. (d) Task Reliability: this signifies how much reliable the task is - this value can be a discreet value (such as high, medium, low) It has to be a continuous value in a specified range (say 0 to 1 with normalization) if . This value can be learned by observing task executions in real world or computer simula- tions. For examples, task of moving straight can be thought Figure 2: Task Reliability Graph and Reliability Zones as reliable, however rotating on axis may turn out to be un- reliable task under precision demanding scenarios (like rotate We explain TRG with the help of a simple telepresence by exact r0 ). (e) Task Transition: this indicates the switching scenario. Some of the sub tasks can be: open a door (T1), from one task to another following the planner set sequence. close a door (T2), turn on lights (T3), turn off lights (T4), When a task completes execution, the control is given back to move a step (T5 - this can be further subdivided as sub tasks the cognitive engine from the actuator, which again initiates of vector movements), apply pressure to hold an object (T6), the next task in planned pipeline. Some task transitions are lift an object (T7), move an object (T8). Suppose the end seamless like between moving straight and rotation. How- user from a remote location issues a command to find an ob- ever, task transitions between the state actions of ‘turning off ject ‘mobile phone’ left somewhere at home. In this case, the lights’ and ‘moving ahead’ are not reliable – as movement in transition T4 → T5 is risky; T6 → T8 is also risky if object dark is prone to failure without sensors that work in dark. Un- can not be moved without lifting (here the phone). Also had reliable transitions mean that there is high risk of failure. (f) object being something heavy like portable cupboard, then Task Transition Reliability: it means how reliable is the tran- lifting it may cause the robot to lose balance and damage it- sition from one task to another. The numeric values around self. If the robot has sensors to sense in dark (like infrared it can also be learned by monitoring task transitions in real camera), then T4 → T5 is not risky. So each task transition is world and simulations. (g) Safe zone: the world state where characterized by robot’s current perception capacity as well chances of robot doing un-intended action is almost nil (h) as task and environment state. If the current scene set lacks Risk zone: the world state where there is a chance of task dynamic obstacles, then risk factors get minimized automati- failure, but recovery or rollback to a safe zone is still possi- cally. Hence, reliability values of task and task transitions are ble (e) Failure Zone: the world state where the robot cannot based on different scenarios and constraints – accordingly the recover and needs external intervention. This world state is optimal task path for same goal (user instruction) will vary. more prone to safety violations. Fig 2. shows an instance of a Task Reliability graph where 3.4 State Reliability the objective is to reach Goal node from Start node. The op- timal path for given problem is found by looking simultane- Joint monitoring the path to goal in solution space as well ously at multiple task nodes (having reliability weights), their as state variables is a better approach than relying only on connecting edges (having task transition reliability weights), one aspect. As an example, tasks can fail if battery power- the safety value assigned to the edges and nodes. Depending ing the robot (object state) gets drained although there was on the constraints (limited time, no risk) the optimal path will no issues in the reliability of task at hand and transitions. For be different. To simplify understanding, reliability weights each goal, a task path is created that contains states of vari- are not considered in following cases. ables. Simultaneously relevant world state (a subset can be Case 1: If safety is highest priority – non-negotiable: extracted by looking at the problem, goal and possible paths) No path exist, until safety is relaxed. needs to be monitored and reliability can be assigned to (a) Case 2: If time is highest priority (emergency), Path: each relevant object state (b) relevant world subset of states Start → Task1 → Task3 → Goal. representing a possible state. This may not be mapped to a Case 3: If time and safety have relaxed priority, Path: specified task template but can be thought of as an unknown Start → Task0 → Task4 → Task5 → Task7 → Task3 → Goal. state, but should not be generalized as a single dead state. If [Above path has all safe ‘green’ transitions, but 2 risky tasks] there are ‘m’ state variables with ‘n’ possible discreet val- It is important to keep note of visited nodes and where oc- ues, there will be much less than maximum possible world currence of failure is more in the graph cycles. Another pa- states, as co-occurrence of [ m1 (n1 ), m2 (n2 ))] is not pos- rameter to consider is cost of rollback - how further back from sible due to semantic and logical constraints. For example, current node in graph agent needs to travel in order to reach a state variable ‘light’ in ‘off’ state will make camera sensed safe taste. Another point is – if the robot agent do not have the state variables null, as due to darkness there is no visibility. suitable sensors to monitor the state to carry out a vulnerable This constraints can be derived from ontological descriptions, task, then such a task by default will fall in Risk zone. common sense knowledge as well specified in PDDL under the :constraints section. This can lead to task reliability pre- 4 Conclusion diction as a weighted equation: The intention of this paper is to ensure that AI systems of the maximize ( w1. world state reliability future that uses task planning are not just optimistically safe + w2. individual state reliability but robustly, verifiably safe — because it was designed and + w3. task transition reliability built with safety and reliability in mind from ground zero. + w4. task execution reliability ) The strategies laid out are generic enough for widespread 3.5 Task Monitoring and Decision Making adoption across safe task planning use cases. Experimen- tation on multiple use cases on both simulation and real Any machine learning model is bound to failures as the pre- life situations will enable more insights to the proposed ap- dictions cannot be 100% certain – a model is as good as its proaches. Future work will include mathematical formula- features [Banerjee, S, et al. 2016]. The fundamental aspect tion with safety guarantees [Polymenakos, K. et al. 2019] to of safe and reliable task planning is to minimize levels of un- model agent (robot) actuation in an uncertain world. certainty. So, if a world model is exhaustive, then uncertainty is zero. However, in real life, it is very difficult to have an exhaustive state space description on which an agent can ac- 5 References tuate. So a hybrid approach entangling uncertainty handling [Amodei et al. 2016] Concrete problems in AI safety, arXiv in the core decision making process is a practical way out:- preprint arXiv:1606.06565. [Axelrod, Brian et al. 2018] Provably safe robot naviga- Algorithm 1: Safe and Reliable Task Planning tion with obstacle uncertainty. The International Journal of Robotics Research 37.13-14: 1760-1774. Result: Task Success or Failure under constraint set C [Banerjee, Snehasis, et al. 2016] Towards wide learning: Parameters: Experiments in healthcare. arXiv preprint arXiv:1612.05730. perception ← sensor input stream; ML4Health, NIPS 2016. actuation ← movement or manipulation by agent; [Berk, Mario, et al. 2019] Reliability assessment of safety- knowledge ← link to semantic knowledge store; critical sensor information: Does one need a reference truth? software ← link to software modules and libraries; IEEE Transactions on Reliability 68.4 (2019): 1227-1241. goal ← target goal state or final task state to reach; [Dhillon, B.S. 2015] Robot System Reliability and Safety: CS ← current state of agent; A Modern Approach. CRC Press. Begin: [Espinoza et al. 2019] Towards an AI Safety Landscape - plan ← software.plan(perception, knowledge, goal); An Overview. Source: https://www.ai-safety.org. TRG ← initialize Task Reliability Graph with priors; [Fisac, Jaime F., et al. 2018] Probabilistically safe robot while goal 6= CS And pre-conditions = satisfied do planning with confidence-based human predictions, arXiv CS ← plan.nextTaskStep(); preprint arXiv:1806.00109. if Constraints in CS w.r.t. TRG ⊆ set C then [Garcıa et al. 2015] A comprehensive survey on safe rein- TRG.evaluate(perception) for Changes; forcement learning. Journal of Machine Learning Research if Changes detected in world state then S1 ← software.RL.evaluate(perception) - 16.1: 1437-1480. generate next step based on [Gayathri et al. 2018] Ontology based knowledge repre- reinforcement learned model; sentation technique, domain modeling languages and plan- S2 ← get next step from TRG after ners for robotic path planning: A survey. ICT Express 2018 evaluation of reliability of world state, Jun 1; 4(2):69-74. CS, task transition, task execution; [Gerevini et al. 2009] Deterministic planning in the fifth actuation ← voting (S1 ∩ S2) international planning competition: PDDL3 and experimen- else tal evaluation of the planners.AI 173.5-6: 619-668. actuation ← voting (TRG.nextActuation() [Hoy et al. 2015] ”Algorithms for collision-free naviga- ∩ software.RL.nextActuation() ); tion of mobile robots in complex cluttered environments: a end survey.” Robotica 33.3: 463-497. TRG.update() - weights of edges and nodes; [Ingrand et al. 2017] Deliberation for autonomous robots: software.RL.update() - update rewards by A survey. Artificial Intelligence 247, 10-44. processing current scene and task status; [Kern-Isberner, G. et al. 2017]. Many facets of reasoning knowledge.update() - object state values; under uncertainty, inconsistency, vagueness, and preferences: A brief survey. KI-Künstliche Intelligenz, 31(1), 9-13. else [Krakovna et al. 2018] Measuring and avoiding side effects plan ← software.replan(perception, TRG); using relative reachability. arXiv preprint arXiv:1806.01186. end [Mukherjee, Debnath, et al. 2013] Towards efficient stream if goal = reached Or Exit Criterion met then actuation ← STOP; Wait for next command; reasoning. On the Move to Meaningful Internet Systems, end Springer, pp. 735-738. [Polymenakos, Kyriakos, et al. 2019] Safety guarantees end for planning based on iterative Gaussian processes. arXiv preprint arXiv:1912.00071.