Towards Safe and Reliable Robot Task Planning

                                                 Snehasis Banerjee
                                             TCS Research & Innovation
                                       Tata Consultancy Services, Kolkata, India
                                              snehasis.banerjee@tcs.com


                          Abstract                                 Reliability [Dhillon, B.S. 2015] is closely related to AI
                                                                Safety. Reliability can be defined as how probable is the
     A long-standing goal of AI is to enable robots to          agent to successfully complete a task without failures, while
     plan in dynamic and uncertain environments by              safety is determined by the direct or indirect consequences
     managing task failure intelligently. Reliability of a      of the failures. In case of a semi-autonomous agent like a
     robot task planner has become essential to ensure          telepresence robot, reliability is hampered due to several rea-
     operational safety. Reliability is generally deter-        sons such as random component and sensor failures, human
     mined by the probability of a task to circumvent           errors, systematic hardware failures (degradation over time),
     failures, while safety is related to the consequences      software errors, dynamic environment. Some related works
     of the failures. In this paper, methods and an archi-      like [Axelrod, B. 2018] and [Fisac, J.F. 2018] has shown the
     tecture for handling reliability and safety in robotic     importance of policy level safety aspects in the case of robot
     task planning is presented. The paper takes help of        planning in dynamic obstacle scenarios. However, attaching
     a telepresence navigation scenario to describe some        safety aspects to task planning has remained neglected by lit-
     candidate solutions such as Task Reliability Graph         erature – which is the key highlight of this paper, explained
     and weighted reliability goals.                            with the help of a telepresence navigation scenario.
                                                                   AI Task Planning is based on decision making strategies of
1   Introduction                                                action sequences, typically for execution by intelligent and
Characteristics of a typical robotic AI system are (a) Adapt-   autonomous agents, especially robots. In known environ-
ability (b) Autonomy (c) Interaction. Decision making in AI     ments with available models or maps, planning can be done
systems can be fully autonomic or semi-autonomic in case        offline. In dynamically changing unknown environments and
of involvement of external entity (say human operators in       in open world scenarios, the strategy needs to be revised and
telepresence scenario). Recently, ‘AI Safety’ as a research     adapted while plan is executing. This requires continuous
area has gained prominence due to the gradual infiltration of   monitoring of state variables (states of objects in environ-
autonomous agents in the normal human environment. Work         ment). An example object discreet state variable is door in
is ongoing to create an AI Safety Landscape [Espinoza et al.    open or closed position. However, a continuous state variable
2019] to nurture a common understanding of current needs,       is also possible, say the angle of door being open (in semi
challenges and state of the art and field practices relevant    closed position). Choice of object state variable depends on
to safety in AI. [Amodei et al. 2016] discusses five core       domain and how much granularity is needed in a solution to
research areas related to AI safety. This paper focuses on      accomplish a given goal. Typically, PDDL (Planning Domain
two areas: (a) Avoiding Negative Side Effects - ensuring the    Definition Language) [Gerevini et al. 2009] and its variants
agent actions meet safety constraints (b) Safe Exploration      are used to represent the problem, initial conditions, goal state
- in dynamic and open environments, how reliable are the        and action sequences. This is passed on to a suitable planner
strategies to avoid safety hazards. A body of work [Garcıa et   (based on semantics used in PDDL) to output a suitable se-
al. 2015] [Krakovna et al. 2018] has been done on Machine       quence of tasks. To take care of different constraints and se-
Learning (ML) based approaches to tackle the above areas.       mantic descriptions, a large number of planning methodology
However, data driven ML approaches being non-contextual,        exists such as metric-based planning, spatial planning, tem-
lacks the richness of semantics in decision making. The         poral planning, probabilistic planning, belief based planning,
paper takes a knowledge-guided AI planning approach             planning on open world and open domains. However, while
to tackle the aforementioned problems, specifically to a        the focus of the planning research community has been to
telepresence navigation scenario in uncertain environment.      come up with planners with faster computing time and richer
                                                                semantic processing capability, the aspects of safety and reli-
Copyright c 2020 for this paper by its authors. Use             ability in tasks as a feature has remain mostly neglected. The
permitted under Creative Commons License Attribution 4.0        paper describes an architecture and methodology to tackle the
International (CC BY 4.0).                                      safety and reliability aspects as a first step to fill this gap.
2   Applicability in Telepresence Navigation                         Another limitation a robot may face is range of a sensor: if
                                                                     the hazard or moving obstacle is not in reception coverage of
To illustrate the scenarios in which safety and reliability be-      a sensor, the robot cannot detect whether the selected trajec-
comes essential for robotic task planning, help of telepres-         tory is correct. A robot’s visual perception may give only an
ence use case is taken here. A telepresence robot (type of           imperfect view of clear path as well as objects due to sensor
service robot) helps a human user (called operator) to be vir-       errors or non-favorable lighting conditions and environmen-
tually present at a remote location. Telepresence robots have        tal layout. If the telepresence robot is provided with tactile
started to gain popularity in schools, corporate offices, hospi-     object manipulation capabilities, then the risk becomes more
tals, clinics, warehouses and conferences. The default telep-        severe. An example is: if in a pantry environment, the robot
resence robot comes with a camera and navigation wheels on           lits fire to serve a user instruction, without having a gas and
top of which more sensors can be added to enhance its percep-        smoke detection sensor. Related safety issues are handling
tion of the environment. A telepresence robot can operate in         fragile objects without care as the robot lacks the knowledge
fully guided, semi-autonomous as well as autonomous mode             and experience of holding that object. All this lead to task
as per application needs and sophistication in software and          failures and non-adherence to safety. This demands a dedi-
hardware used for that application. However, fully guided            cated focus on safety and reliability for robotic task planning.
mode is discouraged as if user’s instructions are strictly fol-
lowed, the robot may cause harm to itself or others, due to          3     Proposed Architecture and Methodology
lack of any restrictions. Hence, the desired modes of oper-
ation are either autonomous or semi-guided with restrictions         In a human robot interaction (HRI) set up, the User passes
(constraints). If a high level instruction is provided by the        on a high level task instruction (via text, audio, etc.) that is
user say ‘Find where the paper shredder is kept’, the robot          captured by the robotic system. The system needs to decide
analyses the instruction and decomposes the task into se-            an optimal sequence of steps – which will be carried out as
quence of most optical sub-tasks to achieve the goal. This           actuation in the environment, in order to satisfy the goal of the
requires the robot to get a visual idea about object namely ‘pa-     user instruction. A generic architecture (Fig. 1) is presented
per shredder’ it has to find. Also, if the object is not currently   along with significance of each individual module in aiding
in visibility, it needs to explore the environment until it finds    optimal task execution. Granular details of each module is
the goal object. In semi-guided mode, the successful working         skipped to keep the focus on safety and reliability aspects.
of the robot relies heavily on communication network where a
remote user is controlling the robot based on the feedback the
user is perceiving from the sensors attached to the robot (such
as ego view camera). If there is a communication failure, the
semi-guided mode may prove to be disastrous especially if
communication fails during executing a sub-task. Commu-
nication failure may lead to robots colliding with obstacles
due to continuation of trajectory in absence of expected in-
terrupt. Also while exploring, if it comes to a zone of con-
tinuous moving obstacles (like living beings or other robots),
it may go to a stand still (with goal of moving forward) to
keep safe distance. Considerable amount of work has been
done on collision avoidance algorithms in navigation [Hoy
et al. 2015]. PDDL based planning approach [Gayathri et
al. 2018] [Ingrand et al. 2017] has been extensively used in               Figure 1: Architecture for Reliable & Safe Task Planning
robot navigation tasks. However, the algorithm depends on
continuous input from sensors that may be faulty and provide
erroneous readings or sensed data that eventually leads to col-          • Reasoner: This module infers semantic facts combin-
lision. Also, there are issues in bridging software and hard-              ing apriori knowledge and perceived realtime facts while
ware instructions when interacting with physical devices. An               operating in the working memory [Mukherjee, D. et
example is: if a robot is given a software instruction to rotate           al. 2013]. Breaking a given high level instruction into
X0 angle, the robots’ physical wheel actually rotates (X+n)0               smaller tasks and overcoming dis-ambiguity is taken
angle, thereby making the robot unsafe in precise navigation.              care by this module. In uncertain environments, deci-
Lack of domain and commonsense knowledge can also have                     sions can be taken by using standard methodologies in
negative consequences in successful task completion. Sup-                  reasoning under uncertainty [Kern-Isberner et al. 2017].
pose a robot via its camera can see the target object, however           • Planner: Planning deals with modeling of the relations
the terrain is not safe for navigation (say staircase in case of           between perceptions and actions. Planning uses a world
wheeled robot), blindly following the instruction will lead the            model to estimate deliberation steps as per goal. When
robot to fall and break down. Also there can be adversarial                problem domain, objects and relevant actions, initial and
situations that can fool a robot perception thereby leading to             goal states are properly encoded in a semantic format
task failure. This can be due to target objects appearing in               such as PDDL, a planner can find the best suitable steps
mirror, electronic display screen (say television) or portraits.           to reach goal state. In dynamic and open worlds with
  high scope of task failures, dynamic planning becomes            Cognitive Engine’s main role is observe–orient–decide–act
  essential. In dynamic planning, object state variables are    (OODA). It’s main jobs are: (a) Handle failures and errors (b)
  monitored and if a failure state is nearing, re-planning      Learn when things go right and wrong, or in other words learn
  happens from current state condition towards goal.            from negative and positive experience in various task execu-
• Reinforcement Learning (RL) Module: As the robot              tion scenarios (c) Ensuring safety in operations (d) Predicting
  agent explores the world and carries out sub tasks to         likelihood of a task failure or safety risk, given current state.
  reach its goal, it continuously learns through its per-       The following methods shed lights on system objectives.
  ception capabilities and consequences of actions in the
  environment. The action and the path (taken by robot)
                                                                3.1   Using Safety and Reliability in Metric PDDL
  goes as a direct feedback as what to do when confronted       Assigning reliability and safety to individual task action se-
  with a similar situation - this scenario is modeled by RL     quence in semantic forms like PDDL is straight-forward so-
  with rewards to successfully reach its goal under con-        lution. In the goal objective function, the intricacies of safety
  straints and penalty (negative reward) in failures. In sim-   and reliability can be included. Optimal path search can be
  ple worlds, if only knowledge based decision making is        done by using metric supported planners as follows:
  used, RL module can be altogether omitted. However,              (:metric maximize (total-reliability))
  in complex worlds where granularity of knowledge may             (:metric maximize (total-safety))
  become limited in expression, a hybrid approach by mix-       Task actions will have an effect variable like:
  ing semantic decision making with RL policy based ac-            :effect (and ... (increase (total-reliability) R)
  tuation decisions will prove fruitful.                           (increase (total-safety) S))
                                                                A drawback of this technique is that one is required to know
• Knowledge Manager: When dealing with data from var-
                                                                beforehand all the safety and reliability values for each task,
  ious sensors it is essential to bring all the information
                                                                which might not be possible in a cold start scenario. If prob-
  in a common semantic knowledge format compliant for
                                                                ability and uncertainty is involved, P-PDDL (probabilistic
  reasoning and planning. The purpose of this module is
                                                                PDDL) with unknown states and probabilities will be a more
  to gather, curate and unify variety of knowledge (like
                                                                suitable representation. Sensor failures can be modeled based
  facts, relations, ontologies). It also connects the Seman-
                                                                on sensor error modeling techniques [Berk, M. et al. 2019].
  tic Web via SPARQL endpoints for accessing on demand
                                                                But task level failures can be learned by actual task execution
  relevant knowledge missing in current working memory.
                                                                in real life or via realistic simulations in robot simulators like
• Knowledge Store: Knowledge being represented in var-          Webots, Gazebo, Gibson, AI2Thor.
  ious formats, a requirement is to curate the same in
  blocks. Due to dynamic environments, the knowledge            3.2   Safety Priorities and Vulnerability Values
  needs to be updated and over-written based on learned         In the context of AI Safety, it is often observed that total fo-
  experience or gathering of fresh unseen knowledge.            cus is given to human safety. However, in situations where a
• Learned Model: This serves as a store for updating            robot agent is interacting with the environment, it is also es-
  learned models (safety and reliability weights over a task    sential to give safety priorities to other living beings, valuable
  and a task group) from actuation feedback via either re-      objects and the robotic agent itself (it also has value). This is
  inforcement learning or the monitoring module of the          achieved by maintaining a semantic database that contains the
  system or both. Manual entry of initial weights and           relative priorities and properties of the entities that will enable
  range thresholds will make system robust to over fitting.     decision making under the specified constraints. Diverse sit-
• Perception Module: A robot is equipped with multiple          uations and applications will have different priority values.
  sensors which is required for successful completion of        Hence, it is better to have a relative index instead of an abso-
  a task and its own operation. The perception module is        lute one. In case of service robots, the end user should be pro-
  designated with processing raw sensor readings and con-       vided an interface to set custom values, as what humans value
  verting them usable semantic information for the cogni-       most varies from one person to another, which is a technical
  tive engine. As an example, a camera image is processed       challenge. As an example, a simple cup might have less mate-
  to identify objects and context from robot’s ego view.        rial value in terms of money, but if a human memorable event
                                                                is attached with it, then its relative value becomes higher. A
• Actuation Module: Actuation of the robot is based on          naive strategy is to keep safe distance from high value entities
  physical interaction with the environment. This is gen-       while actuating. A significant aspect to consider is entities in
  erally achieved by sending motor movement instructions        a environment can also be tagged with degrees of vulnerabil-
  from the software interaction module of a robot’s base        ity. As an example, glass made objects if in navigation path
  system (usually ROS – Robot Operating System) on top          of robot, may prove to be vulnerable if robot somehow devi-
  of which the cognitive engine is running.                     ates from its intended route (resulting in collision). However,
• Monitoring Module: This is the most important module          based on time and emergency constraints, robot may have to
  (Section 3.5) from the point of safe and reliable task ex-    execute tasks that carry some risk of failure and safety relax-
  ecution. The module oversees (a) monitoring current en-       ations. A sample manual entry in tabular format is enlisted:
  vironment state including robot and objects (b) filtering     Class: Robot | Vulnerability - 0.3 | Safety Priority - 0.6
  the perception module for any conflicting sensed infor-       Class: Glass Vase | Vulnerability - 0.9 | Safety Priority - 0.4
  mation (c) providing realtime input to decision making.       Class: Human | Vulnerability - 0.6 | Safety Priority -1
3.3   Task Reliability Graph
This section describes the concept of modeling and handling
task failures in terms of Task Reliability Graph (TRG). Some
terminology is defined first: (a) Start/Init: this is initial state
of the world (b) Goal: it is the expected state of the world
after complete plan is executed. (c) Task Node: subtasks rep-
resenting action sequences that has pre-conditions and effects
on the world. This can also be thought as action recipes. (d)
Task Reliability: this signifies how much reliable the task is
- this value can be a discreet value (such as high, medium,
low) It has to be a continuous value in a specified range (say
0 to 1 with normalization) if . This value can be learned by
observing task executions in real world or computer simula-
tions. For examples, task of moving straight can be thought                 Figure 2: Task Reliability Graph and Reliability Zones
as reliable, however rotating on axis may turn out to be un-
reliable task under precision demanding scenarios (like rotate           We explain TRG with the help of a simple telepresence
by exact r0 ). (e) Task Transition: this indicates the switching      scenario. Some of the sub tasks can be: open a door (T1),
from one task to another following the planner set sequence.          close a door (T2), turn on lights (T3), turn off lights (T4),
When a task completes execution, the control is given back to         move a step (T5 - this can be further subdivided as sub tasks
the cognitive engine from the actuator, which again initiates         of vector movements), apply pressure to hold an object (T6),
the next task in planned pipeline. Some task transitions are          lift an object (T7), move an object (T8). Suppose the end
seamless like between moving straight and rotation. How-              user from a remote location issues a command to find an ob-
ever, task transitions between the state actions of ‘turning off      ject ‘mobile phone’ left somewhere at home. In this case, the
lights’ and ‘moving ahead’ are not reliable – as movement in          transition T4 → T5 is risky; T6 → T8 is also risky if object
dark is prone to failure without sensors that work in dark. Un-       can not be moved without lifting (here the phone). Also had
reliable transitions mean that there is high risk of failure. (f)     object being something heavy like portable cupboard, then
Task Transition Reliability: it means how reliable is the tran-       lifting it may cause the robot to lose balance and damage it-
sition from one task to another. The numeric values around            self. If the robot has sensors to sense in dark (like infrared
it can also be learned by monitoring task transitions in real         camera), then T4 → T5 is not risky. So each task transition is
world and simulations. (g) Safe zone: the world state where           characterized by robot’s current perception capacity as well
chances of robot doing un-intended action is almost nil (h)           as task and environment state. If the current scene set lacks
Risk zone: the world state where there is a chance of task            dynamic obstacles, then risk factors get minimized automati-
failure, but recovery or rollback to a safe zone is still possi-      cally. Hence, reliability values of task and task transitions are
ble (e) Failure Zone: the world state where the robot cannot          based on different scenarios and constraints – accordingly the
recover and needs external intervention. This world state is          optimal task path for same goal (user instruction) will vary.
more prone to safety violations.
   Fig 2. shows an instance of a Task Reliability graph where
                                                                      3.4   State Reliability
the objective is to reach Goal node from Start node. The op-
timal path for given problem is found by looking simultane-           Joint monitoring the path to goal in solution space as well
ously at multiple task nodes (having reliability weights), their      as state variables is a better approach than relying only on
connecting edges (having task transition reliability weights),        one aspect. As an example, tasks can fail if battery power-
the safety value assigned to the edges and nodes. Depending           ing the robot (object state) gets drained although there was
on the constraints (limited time, no risk) the optimal path will      no issues in the reliability of task at hand and transitions. For
be different. To simplify understanding, reliability weights          each goal, a task path is created that contains states of vari-
are not considered in following cases.                                ables. Simultaneously relevant world state (a subset can be
   Case 1: If safety is highest priority – non-negotiable:            extracted by looking at the problem, goal and possible paths)
No path exist, until safety is relaxed.                               needs to be monitored and reliability can be assigned to (a)
   Case 2: If time is highest priority (emergency), Path:             each relevant object state (b) relevant world subset of states
Start → Task1 → Task3 → Goal.                                         representing a possible state. This may not be mapped to a
   Case 3: If time and safety have relaxed priority, Path:            specified task template but can be thought of as an unknown
Start → Task0 → Task4 → Task5 → Task7 → Task3 → Goal.                 state, but should not be generalized as a single dead state. If
[Above path has all safe ‘green’ transitions, but 2 risky tasks]      there are ‘m’ state variables with ‘n’ possible discreet val-
   It is important to keep note of visited nodes and where oc-        ues, there will be much less than maximum possible world
currence of failure is more in the graph cycles. Another pa-          states, as co-occurrence of [ m1 (n1 ), m2 (n2 ))] is not pos-
rameter to consider is cost of rollback - how further back from       sible due to semantic and logical constraints. For example,
current node in graph agent needs to travel in order to reach a       state variable ‘light’ in ‘off’ state will make camera sensed
safe taste. Another point is – if the robot agent do not have the     state variables null, as due to darkness there is no visibility.
suitable sensors to monitor the state to carry out a vulnerable       This constraints can be derived from ontological descriptions,
task, then such a task by default will fall in Risk zone.             common sense knowledge as well specified in PDDL under
the :constraints section. This can lead to task reliability pre-   4   Conclusion
diction as a weighted equation:                                    The intention of this paper is to ensure that AI systems of the
   maximize ( w1. world state reliability                          future that uses task planning are not just optimistically safe
   + w2. individual state reliability                              but robustly, verifiably safe — because it was designed and
   + w3. task transition reliability                               built with safety and reliability in mind from ground zero.
   + w4. task execution reliability )                              The strategies laid out are generic enough for widespread
3.5   Task Monitoring and Decision Making                          adoption across safe task planning use cases. Experimen-
                                                                   tation on multiple use cases on both simulation and real
Any machine learning model is bound to failures as the pre-
                                                                   life situations will enable more insights to the proposed ap-
dictions cannot be 100% certain – a model is as good as its
                                                                   proaches. Future work will include mathematical formula-
features [Banerjee, S, et al. 2016]. The fundamental aspect
                                                                   tion with safety guarantees [Polymenakos, K. et al. 2019] to
of safe and reliable task planning is to minimize levels of un-
                                                                   model agent (robot) actuation in an uncertain world.
certainty. So, if a world model is exhaustive, then uncertainty
is zero. However, in real life, it is very difficult to have an
exhaustive state space description on which an agent can ac-       5   References
tuate. So a hybrid approach entangling uncertainty handling        [Amodei et al. 2016] Concrete problems in AI safety, arXiv
in the core decision making process is a practical way out:-       preprint arXiv:1606.06565.
                                                                      [Axelrod, Brian et al. 2018] Provably safe robot naviga-
 Algorithm 1: Safe and Reliable Task Planning                      tion with obstacle uncertainty. The International Journal of
                                                                   Robotics Research 37.13-14: 1760-1774.
  Result: Task Success or Failure under constraint set C
                                                                      [Banerjee, Snehasis, et al. 2016] Towards wide learning:
  Parameters:
                                                                   Experiments in healthcare. arXiv preprint arXiv:1612.05730.
  perception ← sensor input stream;
                                                                   ML4Health, NIPS 2016.
  actuation ← movement or manipulation by agent;
                                                                      [Berk, Mario, et al. 2019] Reliability assessment of safety-
  knowledge ← link to semantic knowledge store;
                                                                   critical sensor information: Does one need a reference truth?
  software ← link to software modules and libraries;
                                                                   IEEE Transactions on Reliability 68.4 (2019): 1227-1241.
  goal ← target goal state or final task state to reach;
                                                                      [Dhillon, B.S. 2015] Robot System Reliability and Safety:
  CS ← current state of agent;
                                                                   A Modern Approach. CRC Press.
  Begin:
                                                                      [Espinoza et al. 2019] Towards an AI Safety Landscape -
  plan ← software.plan(perception, knowledge, goal);
                                                                   An Overview. Source: https://www.ai-safety.org.
  TRG ← initialize Task Reliability Graph with priors;
                                                                      [Fisac, Jaime F., et al. 2018] Probabilistically safe robot
  while goal 6= CS And pre-conditions = satisfied do
                                                                   planning with confidence-based human predictions, arXiv
      CS ← plan.nextTaskStep();
                                                                   preprint arXiv:1806.00109.
      if Constraints in CS w.r.t. TRG ⊆ set C then
                                                                      [Garcıa et al. 2015] A comprehensive survey on safe rein-
          TRG.evaluate(perception) for Changes;
                                                                   forcement learning. Journal of Machine Learning Research
          if Changes detected in world state then
              S1 ← software.RL.evaluate(perception) -              16.1: 1437-1480.
               generate next step based on                            [Gayathri et al. 2018] Ontology based knowledge repre-
               reinforcement learned model;                        sentation technique, domain modeling languages and plan-
              S2 ← get next step from TRG after                    ners for robotic path planning: A survey. ICT Express 2018
               evaluation of reliability of world state,           Jun 1; 4(2):69-74.
               CS, task transition, task execution;                   [Gerevini et al. 2009] Deterministic planning in the fifth
              actuation ← voting (S1 ∩ S2)                         international planning competition: PDDL3 and experimen-
          else                                                     tal evaluation of the planners.AI 173.5-6: 619-668.
              actuation ← voting (TRG.nextActuation()                 [Hoy et al. 2015] ”Algorithms for collision-free naviga-
               ∩ software.RL.nextActuation() );                    tion of mobile robots in complex cluttered environments: a
          end                                                      survey.” Robotica 33.3: 463-497.
          TRG.update() - weights of edges and nodes;                  [Ingrand et al. 2017] Deliberation for autonomous robots:
          software.RL.update() - update rewards by                 A survey. Artificial Intelligence 247, 10-44.
            processing current scene and task status;                 [Kern-Isberner, G. et al. 2017]. Many facets of reasoning
          knowledge.update() - object state values;                under uncertainty, inconsistency, vagueness, and preferences:
                                                                   A brief survey. KI-Künstliche Intelligenz, 31(1), 9-13.
      else
                                                                      [Krakovna et al. 2018] Measuring and avoiding side effects
          plan ← software.replan(perception, TRG);
                                                                   using relative reachability. arXiv preprint arXiv:1806.01186.
      end
                                                                      [Mukherjee, Debnath, et al. 2013] Towards efficient stream
      if goal = reached Or Exit Criterion met then
          actuation ← STOP; Wait for next command;                 reasoning. On the Move to Meaningful Internet Systems,
      end                                                          Springer, pp. 735-738.
                                                                      [Polymenakos, Kyriakos, et al. 2019] Safety guarantees
  end
                                                                   for planning based on iterative Gaussian processes. arXiv
                                                                   preprint arXiv:1912.00071.