=Paper= {{Paper |id=Vol-1484/paper14 |storemode=property |title=Skill-based Exception Handling and Error Recovery for Collaborative Industrial Robots |pdfUrl=https://ceur-ws.org/Vol-1484/paper14.pdf |volume=Vol-1484 |dblpUrl=https://dblp.org/rec/conf/iros/BeckSFNK15 }} ==Skill-based Exception Handling and Error Recovery for Collaborative Industrial Robots== https://ceur-ws.org/Vol-1484/paper14.pdf
                    Skill-based Exception Handling and Error Recovery
                             for Collaborative Industrial Robots
                                   A. B. Beck, A. D. Schwartz, A. R. Fugl, M. Naumann, B. Kahl






   Abstract— Moving robots from their carefully designed and                 recovery approach that allows non-robot expert users to
encapsulated work cells into the open, less structured human                 operate a robotic system embedded in a human-centric
workspace for collaboration with workers requires robust                     workspace. We briefly introduce our execution model, detail
error detection and recovery strategies. Foreseeing all possible             the Extended Markov Chain based Situation Awareness,
uncertainties and unexpected events and to program in                        which forms the base for Exception Handling, and the Error
recovery actions at setup time is unfeasible. Online learning of             Recovery module employing a Bayesian network and beta-
nominal execution behaviour and automatic detection of                       binomial inference algorithm. The prosed system has been
anomalies using an Extended Markov Model, combined with                      implemented in a pick & place and in an assembly work cell,
interactively trained Bayesian networks for mapping                          which are finally presented.
anomalies to error causes and recovery actions, enables
automatic recovery from previously experienced errors. A                                           II. RELATED WORK
three-layered user-friendly model of errors—causes—
responses and a simple GUI allows non-expert user to define                  Research in exception handling is related to the area of error
new recovery activities and error causes when not yet handled                or fault recovery [17]. Error recovery has been defined as
anomalies occur.                                                             “the process by which the system returns to a state where
                                                                             production can restart after an abnormal and disruptive
                          I. MOTIVATION
                                                                             condition has occurred” [23]. For a robot coworker to
    Today’s robot systems for industrial applications rely on                effectively handle an exception, whether through informing
a structured environment to avoid errors. Parts, fixtures,                   the human worker or resolving the problem by itself, the
tools and stations have defined positions and the workspace                  types of faults that typically occur in the manufacturing
is encapsulated to avoid intruders that could possibly                       robotic assembly cases needs to be understood. Fault
endanger this defined environment. Expected exceptions                       taxonomies have been presented in other related fields,
from the nominal case that were either foreseen during the                   including mobile robots [7], computing [3], autonomous
planning of the robot system, or occurred during the setup                   robots in RoboCup [21], workflow systems [16], service-
phase of the system are coped with by integrating additional                 oriented architecture [6], and web service [8]. Reports show
sensors, adapting tool-, fixture and part geometries and                     that many errors in manufacturing systems, including CNC
adding additional branches to the robot program to cope with                 machines, are hardware related and that approximately 60%
these deviations. Furthermore, as many robotic systems are
                                                                             of all stoppages are due to tool breakdown [23]. However,
complicated, any exceptions and breakdowns occurring after
system setup often require external technicians or engineers                 there has been a lack of study on the likelihood of common
to diagnose and solve problems.                                              errors and exceptions occurring during assembly tasks
                                                                             involving collaborative robots. One of the reasons can be
    Such strictly controlled and carefully designed work                     that robot coworkers have not yet proven to be robust
cells are only economically feasible if the designed robot                   enough for industry application to be studied and
system will run unobstructed for a long time. Small and mid-                 generalized based on real assembly cases [14].
sized enterprises (SMEs) are often characterized by a much
more agile production style and consequently rely on human                               III. SKILL-BASED EXECUTION MODEL
workspaces. Moving robots out of their strictly controlled                       At the base of the system is a Skill Execution Engine,
and carefully designed spaces into human workspaces,                         which allows a more goal-oriented task description than
which are by nature unstructured environments with a high                    strict motion based programming or planning. Without
degree of uncertainty, requires significantly enhanced                       going into details of the skill-model [1], we assume skills to
robustness towards unforeseen events and geometric or                        be independent, sensor-based motion or handling primitives
other uncertainties. (The additional need for safety measures                that adapt themselves to position uncertainties and other
to protect the human co-worker from injuries is out of scope                 deviations from an ideal state using build-in sensing and
of this work, see e.g. [22], [12] and many others.) A SME                    monitoring as well as (limited) internal error recovery.
suitable robot system therefore needs semi automatic                         Robot tasks are constructed by chaining skills and control
exception handling and error recovery capabilities that allow                flow instructions, forming a state machine [2] based on
non-expert users to manage exceptions (internally and                        SCXML1. While skills detect deviations from their expected
externally triggered) occurring in daily operation. We                       performance and report these, the skill executor by itself
propose a novel skill-based exception handling and error                     does not provide any error recovery functionality. Features

   * The research leading to these results has been funded by the European      N. Naumann is with the Fraunhofer Institute for Production Systems
Union’s seventh framework program (FP7/2007-2013) under grant                and Automation. E-mail: Martin.Naumann@ipa.fraunhofer.de
agreements #608604 (LIAA: Lean Intelligent Assembly Automation) and             B. Kahl is with the Gesellschaft für Produktionssysteme GmbH
#287787 (SMErobotics: The European Robotics Initiative for                   Stuttgart. E-mail: bjoern.kahl@gps-stuttgart.de
Strengthening the Competitiveness of SMEs in Manufacturing by
                                                                                1
integrating aspects of cognitive systems).                                        Apache Commons SCXML executor,
   A. B. Beck and A. R. Fugl are with the Danish Technology Institute. E-    http://commons.apache.org/proper/commons-scxml/.
mail: anbb@dti.dk and arf@dti.dk


     FinE-R 2015                                                        Page 5                            IROS 2015, Hamburg - Germany
     The path to success: Failures in Real Robots                                                         October 2, 2015
of the skill executor that allow the implementation of error      A. Situation Model
recovery functionality at higher layers are:                          Situation Assessment uses a Situation Model as a
       The skill executor knows and publishes the current        template description to fuse together the different data points
        state of the system at any time. This allows an error     for learning a Situation. The components of the Situation
        recovery module to relate errors on the one hand to       Model (𝑑𝑖 in (2)) are real number data, which can come from
        specific skill models and on the other hand to            any source and have any meaning. In our experience,
        specific application steps and therefore to draw          combining space and time is critical to the success of
        conclusions like “this is an error that is very typical   learning a skill. For instance, learning a skill using a 6D F/T
        for a pick operation” or “this is an error that           sensor, the Situation Model 𝑠 could be defined as in (3).
        occurred already in the past at this specific                 𝑠𝑎 = [𝑝𝑟𝑖𝑚𝑖𝑡𝑖𝑣𝑒, 𝐹x , 𝐹y , 𝐹𝑧 , 𝑇𝑥 , 𝑇𝑦 , 𝑇𝑧] (3)
        execution step of the application”.
                                                                      The component 𝑝𝑟𝑖𝑚𝑖𝑡𝑖𝑣𝑒 of 𝑠a is a data point that
       The skill executor has an interface for an error          uniquely identifies the current primitive being executed in
        recovery module to stop and later continue the            the skill. In this case, the unique primitive ID provides the
        execution of the skill based application program          understanding of time while the understanding of space is
        thereby allowing worker interaction to recover from       provided by the F/T data. By using the primitive ID we can
        errors detected by an error recovery module.              learn a skill time invariantly. This means that SA will only
       The skill formalism used by the skill executor is         learn the sequence of the events and is invariant towards the
        built on the concept of reusable hierarchical skills      duration of the execution of specific primitives. We have
        that are easy to enhance or adapt. It is therefore        found this feature particularly useful when the duration of
        easily possible to include additional mechanisms          the primitives or skills is stochastic. Should it be necessary
        into an existing skill model to cope with errors that     to catch anomalies in relation to when events occur (e.g. too
        could be detected by the system but just have not         early or late), the primitive ID in (3) can be substituted with
        been considered yet.                                      a time data point. Throughout our research, we have
                                                                  successfully applied SA to monitoring digital inputs, such as
    The Situation Assessment (SA) constantly monitors the         the state of one or more grippers. Through the rest of the
overall situation (robot task execution) using data published     paper, we will use the following Situation Model for
by the skills executor as well as by additional sensors           implementation and testing:
dedicated for situation assessment. Deviations flagged by
the SA are further examined by the Exception Handling                 𝑠𝑏 = [𝑝𝑟𝑖𝑚𝑖𝑡𝑖𝑣𝑒, 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑜𝑝en, 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑐𝑙𝑜𝑠𝑒𝑑] (4)
(EH), which devises a possible cause and corrective               where 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑜𝑝𝑒𝑛 and 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑐𝑙𝑜𝑠𝑒𝑑 are binary outputs of
measure, potentially involving user interaction. The whole        reed switches of the gripper: 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑜𝑝𝑒𝑛 is 𝑡𝑟𝑢𝑒 when the
system of skill executer, situation assessment and exception      gripper is fully open and 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑐𝑙𝑜𝑠𝑒𝑑 is 𝑡𝑟𝑢𝑒 when the
handling is collectively referred to as “Exception Handling       gripper is fully closed. Our assumption is that the gripper is
Framework” or “EHF”.                                              grasping an object when both readings are 𝑓𝑎𝑙𝑠𝑒, indicating
               IV. SITUATION ASSESSMENT                           the gripper is neither fully open nor closed.

    The role of Situation Assessment (SA) is to learn and         B. Data Processing and Clustering
monitor the (correct) skill execution and detect non-nominal          The Situation Model serves as a template describing
conditions. Deviations from the learned, nominal behaviour        which sensors SA should fuse together into one single state.
are interpreted as Anomalies, which are passed on to the          In general, all data points in the Situation Model have to be
Exception Handler (section V). Our implementation of SA           real numbers. This allows the computation of one single
is based on prior work by [4] and [5], where SA was applied       metric for each 𝑋𝑖 in (1). We have so far used the Jaccard
to mobile robotics. We implemented and expanded SA to             similarity coefficient as a method for clustering similar
learn skill based execution in a collaborative robotic system.    states. Through experimentation, we have found the
To learn how to perform a skill correctly, SA captures the        algorithm to be useful despite its simplicity.
essence of the skill by learning the timing and sequence of
events that make up the skill. Our approach is to generate        C. Dynamic Learning in Situation Assessment
one parameterized model that includes parameters in the                SA can autonomously learn a skill without the user
space and time domain. SA learns the sequence of events           having to manually specify the states of a skill. We have
within a skill execution by learning a set of parameters with     implemented a spatiotemporal model that allows for online
a temporal component, recording the transition from one           dynamic learning of states over time. For this purpose, we
instantiation of the parameters to the next:                      are currently using the Extensible Markov Model (EMM) as
                                                                  it is useful for online learning of sequences of states [10]. An
   p(𝑋) = 𝑝(𝑋1, 𝑋2, … , 𝑋𝑛)        (1)                            example of a dynamically learned model using the EMM
where 𝑋, a Situation Model, denotes a set of parameters (a        algorithm can be seen in Figure 1. In this example, a robot is
state), 𝑛 denotes a discrete step in time, and 𝑝, a Situation,    picking up a nut from a table and placing it on a pipe in a
denotes the complete distribution of all the states within a      single nonrecurring operation (therefore an open-ended
skill. Each state 𝑋𝑖 of 𝑋 is parameterized:                       chain). The EMM is also useful in learning looped tasks.
   𝑋 = [𝑑1, 𝑑2, … , 𝑑𝑚]      (2)
where 𝑑𝑖 are data components such as sensor values or
robot’s internal state values.




    FinE-R 2015                                              Page 6                          IROS 2015, Hamburg - Germany
    The path to success: Failures in Real Robots                                             October 2, 2015
                                                                          an error. The Fault node models the root cause of the
                                                                           Anomaly and the Response node models the solution to the
                                                                           Fault. This model resembles the diagnosis model used by
                                                                           physicians when examining a patient: Based on symptoms
                                                                           (here: the detected error) an illness is inferred (here the
                                                                           fault) and a therapy decided (here the response). The
                                                                           intermediate step of a fault is necessary, since one and the
                                                                           same observed error (symptom) can have multiple causes.
                                                                           For example an unexpected gripper state can be due to a
                                                                           failed grasping operation, a missing object at the pickup
                                                                           position or a defective gripper itself.
                                                                           B. Bayesian Network

 Figure 1. Example of learning a task. The upper left corner shows the
   temporal sequence of states the skill consists of. These states were
    learned online while the robot performed the task. A full video is
    available at https://www.youtube.com/watch?v=-CKbdQ3ocQo.

D. Anomaly detection
    SA has two modes of operation: learning and detection
during execution. In the learning mode, SA monitors the
data points specified in the Situation Model and builds the
Situation for the skill that is being learned. During execution
of the same skill, SA loads the saved Situation and applies
the same clustering process as during learning. However,
should the clustering of the data result in a new state in (1),
then SA will interpret that as an anomalous state has
occurred and issue an Anomaly warning. Processing and
handling the Anomaly is the task of the Exception Handler
                                                                               Figure. 2. Inference of cause and solution to Gripper Open anomaly.
(EH) module.                                                                   Both error nodes are dependent on both anomaly nodes, allowing the
                   V. EXCEPTION HANDLER                                        Bayesian network to further strengthen the belief about the cause of
                                                                               an anomaly. Simulated in GeNIe1.
    The task of EH is to receive an Anomaly from SA and
provide a suggested solution that is most likely to solve the                 Figure 2 is an example of a Bayesian network with three
problem. For each robotic system, EH maintains a                          Exception Scenarios for the two Anomaly nodes of the
hierarchical four-layered Bayesian network with all
                                                                          sensors 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑜𝑝𝑒𝑛 and 𝑔𝑟𝑖𝑝𝑝𝑒𝑟𝑐𝑙𝑜𝑠𝑒𝑑 in (4). Both Error
exceptions and solutions relevant to that cell. The
                                                                          nodes are dependent on both Anomaly nodes, allowing the
hierarchical structure allows EH to reason about the most
                                                                          Bayesian network to further strengthen the belief about the
suitable solution to a problem. EH provides the suggested
                                                                          cause of an Anomaly (simulated in GeNIe2). The first two
solution to the user along with all other possible solutions.
                                                                          scenarios with nodes 𝑠1 = {1,3,5,8} and 𝑠2 = {1,3,6,9}, offer
The user is free to select the suggested solution, any other
                                                                          two Responses to the Gripper Open Anomaly while the third
solution or to create a new solution. The selection is stored
                                                                          scenario 𝑠3 = {2,4,7,10}, offers a single Response to a
in EH as a sample of user solution preference. Such samples
                                                                          Gripper Closed Anomaly. The numbers in curly braces
are used in priming the network for inference with future
                                                                          indicate the node number in Figure 2. In this example we are
anomalies. With the feedback of user samples, a closed
                                                                          modelling two faults {5,6} and Responses {8, 9} for the
preference-learning loop is formed to provide suggestions
                                                                          Gripper Open Anomaly. If a gripper is unexpectedly open
for solutions to future anomalies. In this section, we provide
                                                                          (Gripper Open = true, Gripper Closed = false), we could
a detailed description of EH and begin with the role of the
                                                                          interpret that as either a pneumatics failure (e.g. loss of air
Exception Scenario ES in EH.
                                                                          pressure) that can be solved by checking and replacing the
A. Exception Scenario                                                     air supply {5,8}, or an actuator failure (e.g. broken gripper)
    The Exception Scenario (ES) is designed as a four-                    that can be solved by repairing the gripper {6,9}. In the
layered model consisting of Anomaly, Error, Fault and                     reverse case of a closed gripper, we could interpret the
Response, inspired by work in [18]. The hierarchy is a four-              failure as there was no object to grip and the solution is
layered binary Bayesian network that facilitates inferring the            simply to replace the missing object. In Figure 2, the Faults
most likely Response (solution) to an Anomaly (a                          {5,6} are modelled as belonging to the same Error, Gripper
deviation), an example is shown in Figure 2. At the lowest                Operations Error {3}. This allows the network to learn user
level of the network, Anomaly nodes model anomalies                       selections for a specific Fault, Response pair over other pairs
detected by SA. Each Anomaly node corresponds to a data                   belonging to the same Error node. The network is thereby
component (di of (2)) in the Situation Model. Above                       able to encode knowledge specific to individual user
Anomaly, the Error node models which kind of error the                    environments.
Anomaly is and if the Anomaly should even be considered

   2
     Figure 2 shows a screen capture of GeNIe, a Bayesian modelling
environment developed by the Decision Systems Laboratory of the
University of Pittsburgh. Available at http://genie.sis.pitt.edu


     FinE-R 2015                                                      Page 7                               IROS 2015, Hamburg - Germany
     The path to success: Failures in Real Robots                                                          October 2, 2015
                                                                  confirming and rejecting the selection of the node. The Beta
                                                                  distribution is a conjugate distribution to the Binomial
C. Inference                                                      distribution, thereby offering analytical tractability of the
    The process of inferring a Response to an Anomaly in          Bayesian inference process. The conjugate property ensures
the Bayesian network, is the inference process of the EH.         that when updating the prior Beta distribution (7) with new
This process is an implementation of Bayes’ theorem:              evidence following the Binomial distribution, the resulting
                                                                  posterior distribution is also a Beta distribution (8).
   𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 ∝ 𝑝𝑟𝑖𝑜𝑟 ∙ 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 (5)
   We have implemented Bayes’ theorem in three steps:                 (𝑝 | 𝛼∗, 𝛽∗) =Be(𝛼∗, 𝛽∗) (8)
                                                                      𝛼∗ and 𝛽∗ is respectively the new number of selections
       Calculate prior probabilities
                                                                  and rejections for the specific node. Thus, obtaining the
       Introduce evidence to network                             posterior distribution in (8) becomes simply a matter of
                                                                  adding new confirmations to the existing, and then
       Infer posterior probabilities                             calculating the mean (𝜇) and variance (𝑣𝑎𝑟) (9,10).
   In the following, we describe each of these steps.
D. Inference
    A prerequisite for performing inference is the calculation
of prior probabilities. As described in section IV, a feature
of the EHF is to learn the user-preferred solution of a given         In Figure 3, examples of Beta distributions for different
anomaly. When the user selects a specific Exception               values of 𝛼 and 𝛽 are shown. Distribution 1: 𝐵e(𝛼 = 1, 𝛽 =
Scenario (i.e. an Error, a Cause and a Solution) to solve a       1) is a uniform distribution offering an uninformative prior
problem, it is fed back to the database as a sample of the user   with a mean, 𝜇 = 0.5 and a high variance (uncertainty) due
selection, thus learning the preference of selecting this         to the low sample size. In this case, the posterior will largely
Exception Scenario for a specific Anomaly. The sample data        be determined by the data. Distribution 4: 𝐵e(𝛼 = 30, 𝛽 = 5)
is used to calculate the prior probability for each node of the   has 𝜇 = 0.86 and a smaller variance, thus providing a
network. We treat calculating the node’s prior probability as     comparably less uncertain estimate of the user selection
an inference process that adds another layer of Bayesian          preference, 𝑝. The sequence of graphs 1-4 in Figure 3, can
inference as described in (5). We introduce the sample data       be seen as an example of a continuous learning cycle,
from user selection of Exception Scenarios as the evidence        starting with no knowledge of user selection (a uniform
to infer each node’s posterior probability. Each node of the
Bayesian network is a binary random variable modelling an
event that either occurs or not. For instance, if the user
selects the ES {1,3,5,8} in Fig. 2, then the user is confirming
that the specific ES solved the problem (e.g. that a Gripper
Open Anomaly did happen, it was caused by missing air
pressure and the solution was to resupply the air). At the
same time and equally important, the user is also confirming
that alternative events {4,6} to ES {1,3,5,8} did not occur.
Thus, with every selection of an ES, EH registers the
confirmed nodes on all levels of the ES, as well as the
rejected nodes. The process of selecting any node in the
Bayesian network over time, can be viewed as a Bernoulli               Figure 3. Four 𝐵𝑒(𝛼, 𝛽) distributions for different values of 𝛼, 𝛽.
process following a binomial distribution as in (6).                  Note that 𝛼, 𝛽 > 0, thus 𝛼 = 𝛽 = 1 is equal to no samples. 1: 𝐵𝑒(1,1), 𝜇
                                                                      = 0.50, 𝑣𝑎𝑟 = 0.083. 2: 𝐵𝑒(2,1), 𝜇 = 0.67, 𝑣𝑎𝑟 = 0.056. 3: 𝐵𝑒(15,5), 𝜇
   𝑋~Binom(𝑛, 𝑝) (6)
                                                                           = 0.75, 𝑣𝑎𝑟 = 0.0089. 4: 𝐵𝑒(30,5), 𝜇 = 0.86, 𝑣𝑎𝑟 = 0.0034.
where 𝑋 is the number of times a specific node has been
selected. 𝑛 is the number of samples drawn in the sequence.       distribution with no samples) towards more informative
If this process is sampled sufficiently, a distribution           distributions 2-4 as the sample size increases. When a new
reflecting the user selection can be inferred from the sample     node is created with no samples available (𝛼 = 𝛽 = 1), the
set. However, in many cases it is not possible to provide a       Beta distribution is uniform. However, to avoid the
sample set of sufficient size and inference will be subject to    uninformative uniform distribution we propose to query the
uncertainty. To model this uncertainty, we model the user         user to provide a subjective estimate of the selection (the
selection for each node as a hyper-parameter 𝑝, thereby           mean) of this node along with a confidence level (the
modelling the user selection as a random variable itself and      variance). Using the equations for the mean (9) and variance
creating a hierarchical Bayesian model for calculating the        (10), suitable values for 𝛼 and 𝛽 can then be calculated.
prior probability [11]. This approach uses the samples of
user selection as a likelihood function providing evidence to     E. Introducing evidence
the inference process. Given the binomial likelihood, we              The Bayesian network described in section V.B and Fig.
have chosen the Beta distribution as the prior distribution       2 receives evidence in the form of Anomaly information
(7).                                                              gathered by SA. In the example shown in Figure 2, SA has
                                                                  detected that the gripper was unexpectedly fully open (thus
   (𝑝 | 𝛼, 𝛽) = Be(𝛼, 𝛽)     (7)                                  providing evidence that Gripper Open = true, Gripper
    In (7), the user selection is modelled as the                 Closed = false. The evidence is in practice introduced to the
hyperparameter 𝑝, drawing samples from the Beta                   network by clamping the two nodes to their respective
distribution. 𝛼 and 𝛽 is respectively the number of samples       values.


    FinE-R 2015                                              Page 8                              IROS 2015, Hamburg - Germany
    The path to success: Failures in Real Robots                                                 October 2, 2015
F. Posterior probabilities
   After introducing evidence, posterior probabilities for all
nodes are calculated. We have used the SMILE reasoning
engine [9] for inference. The Response having the highest
posterior probability is selected as the suggested solution.
For each Response, the tree is descended towards the root
Anomaly nodes, thus mapping out each possible path
towards the root. The resulting list will have the most
probable ES listed first with all other less likely alternative
ES following in descending probability.
       VI. USER INTERFACE FOR ERROR RECOVERY
    While section V discussed the inner working of the
actual mapping process, we focus on a more user-centric
                                                                                        Figure 5 Adding a new error cause or fault to the system.
view in this section.
    Whenever an anomaly is detected, the error layer                         Anomaly. Hereafter, we introduced the GO Anomaly
classifies it into an error cause. If no cause is found the user             repeatedly, selecting the RA Response as the solution each
is inquired and given the option to assign an existing cause,                time. This process was repeated until EH started to suggest
dismiss the anomaly as not indicating an error or to create a                the RA Response, thus demonstrating EHF’s ability to learn
new cause (including a resolution, if known). Figure 4 shows                 the user preference of selecting the RA Response over the
the dialog box after successfully mapping an anomaly to an                   RP.
error and further to a recovery action. The user can accept
                                                                                Test results are shown in Fig. 6. Initially, EH has five
                                                                             samples confirming the selection of the RP Response for the




                                                                                     Figure 6. Posterior probabilities for solution nodes Replace
                                                                                    Pneumatics (RP) and Repair Actuator (RA) to a Gripper Open
                                                                                  Anomaly. The solid line represents the posterior probability of the
  Figure 4 The system identified an error including a recovery action.
                                                                                  RP and the dotted line represents posterior probability of RA. For
    In case of misclassification the user can add a new exception or
                                                                                  one sample of RP, it takes EH two samples of RA to learn the user
   dismiss the anomaly as not indication an error (button “Continue
                                                                                                     preference of selecting RA.
                               Learning”).


this solution or add a new solution. Figure 5 shows the                      GO Anomaly. Thus, when the GO Anomaly is introduced,
corresponding dialog box for adding a new triplet of error,                  EH suggests RP as the most suitable Response to the GO
error cause and recovery action. The dialog boxes shown in                   Anomaly with probability ~ 0.553. However, the user
Fig. 4 and 5 are designed for use at system runtime and                      ignores the EH suggested RP Response and instead selects
therefore as simplistic as possible. A more elaborated                       RA. Thus, when the GO Anomaly is introduced again, EH
interface for managing the entire network of anomalies,                      now has six samples (five for RA and one for RP),
errors, causes and recovery actions is also provided and                     computing the most likely Response to be RP with
targeted at specifically trained users that setup a new robot                probability ~ 0.545, and so on. At RA sample 5, EH
application.                                                                 computes the probability for each Response being identical
                                                                             (~ 0.526). Again, the GO Anomaly is introduced and this
             VII. EXPERIMENTAL EVALUATION                                    time EH suggests the RA Response with posterior
                                                                             probability ~ 0.535. Thus, with five samples confirming RP,
    Within the scope of SMErobotics, this framework has                      it took six samples of RA for EH to suggest RA.
been intensively evaluated using various experiments. A
detailed example is the failure to grasp as described in the                              VIII. CONCLUSION AND FUTURE WORK
following section. We have tested the system’s ability to
learn the preference of selecting a solution by manually                         Through the test results in section VII, we showed that
introducing the Gripper Open (GO) Anomaly, shown in Fig.                     EH is able to learn the user preference of selecting a solution,
6, during the execution of a skill. In this test, we have tested             even when it had learned a different preference earlier. As
the system’s ability to learn the user preference of selecting               the user selects a specific solution to an Anomaly, the
the Repair Actuator (RA) Response over the Replace                           solution becomes more probable for future selection. This is
Pneumatics (RP) Response. For the purpose of the test, the                   normally helpful, but can be problematic if the user wishes
system had initially no knowledge of user selections                         the system to select a different solution, since learning a new
(samples), except for five samples confirming the choice of                  preference can take several iterations, as the test results
the RP Response as the user preferred solution to the GO                     showed. This is especially true when the sample count for
                                                                             the prior solution is high. A possible future solution could be


    FinE-R 2015                                                          Page 9                               IROS 2015, Hamburg - Germany
    The path to success: Failures in Real Robots                                                              October 2, 2015
to introduce additional information in the Error Layer, e.g.               [22] D. Stengel et al. "An Approach for Safe and Efficient Human-
condition the error cause not only on the counting of user                      Robot Collaboration.", The 6th International Conference on Safety
selections, but also on the state of various system variables                   of Industrial Automated Systems. 2010.
                                                                           [23] C. Syan, Y. Mostefai, "Status monitoring and error recovery in
at time of the user selection.                                                  flexible manufacturing systems", Integrated Manufacturing
    The system is currently being integrated in further                         Systems, Vol. 6 Issue 4, pp.43 – 48, 1995
demonstrators in the context of the SMErobotics project and
will see more in-depth testing and possibly enhancements in
these demonstrators. Concept videos of these showing the
SMErobotics vision of future industrial robotics are
available at http://video.smerobotics.org; especially the D2
and D3 videos are relevant in the context of this work.



                            REFERENCES
[1] R. H. Andersen, “Definition of Hardware-Independent Robot Skills
     for Industrial Robotic Co-Workers”, ISR, 2014
[2] R. H. Andersen , L. Dalgaard , A. B. Beck, J. Hallam, “An
     Architecture for Efficient Reuse in Flexible Production Scenarios”,
     Accepted for IEEE Int. Conf. on Automation Science and
     Engineering (IEEE CASE 2015)
[3] A. Avizienis, et al, “Basic concepts and taxonomy of dependable
     and secure computing,” IEEE Transactions on dependable and
     secure computing, vol. 1, no. 1, pp. 11–33, 2004.
[4] A. B. Bech, “Situation Assessment for Mobile Robots”. PhD thesis,
     DTU Electrical Engineering, Danish Technological Institute, 2012.
[5] A. B. Beck, C. Risager, N. A. Andersen, O. Ravn, "Spacio-
     Temporal Situation Assessment for Mobile Robots," in 14th
     International Conference on Information Fusion (FUSION), 2011.
[6] S. Bruning, et al, "A Fault Taxonomy for Service-Oriented
     Architecture," High Assurance Systems Engineering Symposium,
     10th IEEE , vol., no., pp.367,368, 14-16, Nov. 2007
[7] J. Carlson, R. R. Murphy, "How UGVs physically fail in the field,"
     IEEE Transactions on Robotics, vol.21, no.3, pp.423,437, 2005
[8] K.S.M Chan, et al., “A Fault Taxonomy for Web Service
     Composition,” Service-Oriented Computing - ICSOC 2007
     Workshops, Lecture Notes in Computer Science, pp 363-375, 2009
[9] J. M. Druzdzel, "SMILE: Structural Modeling, Inference, and
     Learning Engine and GeNIe: a development environment for
     graphical decision-theoretic models." AAAI/IAAI. 1999.
[10] M. Dunham et al, ”Extensible Markov Model”. Fourth IEEE
     International Conference on Data Mining, 371-374, 2004
[11] N. Fenton, M. Neil, ”Risk Assessment and Decision Analysis with
     Bayesian Networks”, First edn., CRC Press, 2013
[12] T. Gecks, D. Henrich, "Human-robot cooperation: safe pick-and-
     place operations.", IEEE International Workshop on Robots and
     Human Interactive Communication. 2005.
[13] S. Haddadin, et al. ”Towards the Robotic Co-Worker,” The 14th
     International Symposium ISRR, pp 261-282, 2011.
[14] J. Huckaby, H. I. Christensen, “Toward a knowledge transfer
     framework for process abstraction in manufacturing robotics,” in
     ICML Workshop on Theoretically Grounded Transfer Learning,
     2013.
[15] C. Kemp, et al, "Challenges for robot manipulation in human
     environments [Grand Challenges of Robotics]," Robotics &
     Automation Magazine, IEEE , vol.14, no.1, March 2007
[16] M. Klein, C. Dellarocas; “Knowledge-based Approach to Handling
     Exceptions in Workflow Systems”, Computer Supported
     Cooperative Work, Volume 9, Issue 3-4, pp 399-412, 2000
[17] P. Loborg, “Error recovery in automation: An overview,” in Proc.
     AAAI., Stanford, CA, pp. 94–100, 1994
[18] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks
     of Plausible Inference”, First edn, Morgan Kaufmann, 1998.
[19] M. D. Schmill, et al, “The Role of Metacognition in Robust AI
     Systems,” AAAI-08 Workshop on Meta-reasoning, Chicago, 2008
[20] J. Shah, et al, “Improved human-robot team performance using
     chaski, a human-inspired plan execution system,” in Proceedings of
     the 6th international conference on Human-robot interaction, 2011.
[21] G. Steinbauer, “A Survey about Faults of Robots Used in
     RoboCup,” RoboCup 2012: Robot Soccer World Cup XVI, Lecture
     Notes in Computer Science Volume, pp 344-355, 2013




     FinE-R 2015                                                     Page 10                             IROS 2015, Hamburg - Germany
     The path to success: Failures in Real Robots                                                        October 2, 2015