=Paper= {{Paper |id=Vol-2325/paper-04 |storemode=property |title=Knowledge Representation for Cognition- and Learning-enabled Robot Manipulation |pdfUrl=https://ceur-ws.org/Vol-2325/paper-04.pdf |volume=Vol-2325 |authors=Daniel Beßler,Sebastian Koralewski,Michael Beetz |dblpUrl=https://dblp.org/rec/conf/kr/BesslerKB18 }} ==Knowledge Representation for Cognition- and Learning-enabled Robot Manipulation== https://ceur-ws.org/Vol-2325/paper-04.pdf
                     Knowledge Representation for
         Cognition- and Learning-enabled Robot Manipulation
                                       Knowledge Representation for
                           Cognition- and Learning-enabled Robot Manipulation
                                     Daniel Beßler, Sebastian Koralewski, Michael Beetz ∗
                                              Institute
                                    Daniel Beßler,         for Artificial
                                                   Sebastian     Koralewski,  Intelligence
                                                                                   Michael Beetz ⇤
                                                 Institute for Artificial IntelligenceGermany
                                           Am Fallturm      1, 28359     Bremen,
                                                     Am Fallturm 1, 28359 Bremen, Germany



                               Abstract
   Knowledge representation Abstract
                                 and reasoning (KR&R) systems
   are widely employed for the representation of abstract knowl-
   edge. Action models are usually representations of state tran-                                                                               Reach
   sitions:Knowledge
            Actions can berepresentation          and reasoning
                              performed if all pre-conditions     are
           (KR&R)     systems     are  widely   employed     fortake
                                                                   the                                                                                    contact+
   met, and it is expected that the designated effects will
   place when  the action is executed.
           representation              However,
                              of abstract          embodiedAction
                                             knowledge.       agents                                                                             Push
   need additional
           models knowledge
                     are usually about  how their body should
                                      representations       of statebe
   moved transitions:
           to achieve theirActions
                            goals without  causing
                                      can be          unwantedifside
                                                 performed          all                                                                                   contact-
   effects. The proposed action representation is based on force
           pre-conditions     are met,   and  it is  expected
   dynamic events that occur when an embodied agent interacts
                                                                 that                                                                           Retract
   with itsthe  designated
             world. We show effects      will of
                                how patterns   takeforceplace
                                                          eventswhen
                                                                  can
   be usedthe
            to define
                actionsemantics   of actionHowever,
                         is executed.       verbs. Robots    use our
                                                          embodied
   model to  acquireneed
           agents    episodic  memoriesknowledge
                           additional     which are stories  of their
                                                         about   how
   performance
           their coupled  with sub-symbolic
                  body should      be moveddata,      and they share
                                                 to achieve     their
   their experience through the knowledge service OPEN EASE.
           goals without causing unwanted side effects.                                Figure
                                                                                        Figure1:1:PR2
                                                                                                   PR2closing
                                                                                                         closinga drawer in in
                                                                                                                  a drawer  a kitchen. TheThe
                                                                                                                               a kitchen. action
                                                                                                                                               ac-is
           The proposed action representation is based                                 decomposed into different phases with distinct motion and
                                                                                        tion is decomposed into different phases with distinct
           on force dynamicIntroduction
                                     events that occur when an                         force event pattern.
                                                                                        motion and force event pattern.
           embodied      agent     interacts
The cognition system of humans allows           with itsusworld.      We
                                                               to accom-
plish manipulation       tasks veryof competently.
           show how patterns             force events This can be is used
                                                                     possi-             that actions might cause in terms of force events that
ble through     the organization
           to define    semanticsofofactionsactionin verbs.
                                                        terms of    motion
                                                                 Robots                   One of
                                                                                        might       the main reasons for investigating action models
                                                                                                occur.
phases, anduse through
                 our model the toprediction
                                     acquire ofepisodic
                                                    effects memories
                                                              that actions             from human psychology in robotics is that action models in
might causewhichin terms   of force
                    are stories    of events   that might occur.
                                      their performance         coupled                AI, In  this
                                                                                            such       work, (Ghallab
                                                                                                  as PDDL        we investigate         an action
                                                                                                                              et al. 1998),    usuallymodel
                                                                                                                                                          do notpos-
                                                                                                                                                                  have
   In this with
            work,sub-symbolic
                    we investigatedata,an action                                        tulated
                                                                                       an          in human
                                                                                           appropriate      level ofpsychology,
                                                                                                                        abstractionand       make use
                                                                                                                                        for robots.          of it in
                                                                                                                                                       In particular,
                                             and model       postulated
                                                     they share     theirin
human psychology,                                                                       an artificial
                                                                                       action  modelssystem.
                                                                                                           in AI oftenTheabstract
                                                                                                                             model was awayproposed
                                                                                                                                               from body   bymotions
                                                                                                                                                               Flan-
           experienceand      make usethe
                            through        of it knowledge
                                                 in an artificialservice
                                                                   system.
The model     was proposed by Flannagan et al. (2006). Actions
           openEASE.                                                                   and only concentrate on representing action pre- and post-
                                                                                        nagan    et  al.  [6].   Actions      are  decomposed         into   motion
are decomposed into motion phases with different subgoals.                             conditions,
                                                                                        phases with   anddifferent
                                                                                                             sequences.     Intelligent
                                                                                                                        subgoals.     Theembodied
                                                                                                                                             subgoalsagents       need
                                                                                                                                                           are force
The subgoals are force dynamic events that also generate                               to bridge the
                                                                                        dynamic          gap that
                                                                                                     events     between alsothese    representations
                                                                                                                               generate     distinctivewith      miss-
                                                                                                                                                             sensory
     Introduction
distinctive   sensory feedback in the nervous system.                                  ing  information
                                                                                        feedback     in theand      the actual
                                                                                                               nervous            execution of an action in the
                                                                                                                            system.
   Intentions    of others
     The cognition           can not
                         system      ofbe  monitored
                                        humans            directly.
                                                     allows    us toMoni-
                                                                       accom-          physical
                                                                                           Intentions of others can not is
                                                                                                  world.      Bridging    this   gap   benon    trivial and
                                                                                                                                           monitored           a prob-
                                                                                                                                                           directly.
toring
     plish manipulation tasks very competently.less
        force   events,  on   the  other  hand,    is at least   Thisprob-
                                                                        is pos-        lem   which    is widely      unsolved     on  the  abstract
                                                                                        Monitoring force events, on the other hand, is at least        level  (i.e., by
lematic  because     events                                                            re-usable general knowledge). It is further expected that con-
     sible   through    the may     be monitored
                               organization           in the physics
                                                  of actions     in termsen- of         less problematic because events may be monitored in
gine motion
      of virtual   worlds,    or observed     by   some    agent.  This   is,          ditions and effects of actions are pre-defined – a hard to meet
                 phases, and through the prediction of effects                          the physicswith
                                                                                       requirement       engine       of virtualof worlds,
                                                                                                                the diversity                    or observed
                                                                                                                                      effects actions              by
                                                                                                                                                           may cause
for example, that the hand gets into contact with the milk
package∗Thebefore   grasping    it from   thepaper
                                               table,has
                                                       or been
                                                           that the   pack- by          some   agent.     This
                                                                                       in the physical world.      is,  for  example,      that    the   hand    gets
                research  reported   in this                     supported
age looses    contact
     the German        to theFoundation
                    Research    supportingDFG, surface   when
                                                     as part     the agent
                                                              of Collaborative          into
                                                                                          Thecontact
                                                                                                central with
                                                                                                           questionthe for
                                                                                                                         milk    package embodied
                                                                                                                              successful     before grasping
                                                                                                                                                           action ex-it
performs    a retracting
     Research              motion after the milk has1320
                 Center (Sonderforschungsbereich)           been“EASE
                                                                  grasped.- Ev-         from   the   table,     or   that   the   package     looses
                                                                                       ecution is how agents should move their bodies to achieve         contact    to
     eryday Activity Science and Engineering”, University of Bremen                     the supporting         surface    when     the agent      performsThis  a re-is,
   ⇤ (http://www.ease-crc.org/).
                                                                                       certain  effects while      avoiding     unwanted      side-effects.
     The research reported in this paper has been supported by                         for example,
                                                                                        tracting         how aafter
                                                                                                    motion        robotthe should
                                                                                                                              milk move      its arm
                                                                                                                                      has been          such that the
                                                                                                                                                    grasped.
the German
     CopyrightResearch  Foundation
                 c by the            DFG, as
                          paper’s authors.     part ofpermitted
                                            Copying    Collaborative
                                                                 for pri-
Research   Center (Sonderforschungsbereich)     1320 “EASE - Ev-
                                                                                       pancake
                                                                                           One mixof the contained       in the bottle
                                                                                                               main reasons                it holds is poured
                                                                                                                                    for investigating         action on
     vate and  academic  purposes.
eryday                                                                                 top  of thefrom
                                                                                        models       pancake
                                                                                                           human  maker,     and forms
                                                                                                                       psychology       in arobotics
                                                                                                                                              pancakeiswith that10cm
                                                                                                                                                                  ac-
     In:Activity ScienceA.and
         G. Steinbauer,       Engineering”,
                            Ferrein           Universityofofthe
                                    (eds.): Proceedings      Bremen
                                                                11th In-
(http://www.ease-crc.org/).                                                            diameter.
                                                                                        tion models In thein area     of AIasthere
                                                                                                               AI, such          PDDL  are only    few approaches
                                                                                                                                            [7], usually      do not
     ternational Workshop on Cognitive Robotics, Tempe, AZ, USA,
Copyright   c 2018,published
     27-Oct-2018,    Association  for the Advancement of Artificial
                              at http://ceur-ws.org                                    that
                                                                                        haveaddress    this problem
                                                                                              an appropriate              despite
                                                                                                                       level         the semantic
                                                                                                                               of abstraction       fornature
                                                                                                                                                         robots.of this
                                                                                                                                                                    In
Intelligence (www.aaai.org). All rights reserved.                                      reasoning problem.




                                                                                  11
particular, action models in AI often abstract away                We use the openEASE platform for storing and
from body motions and only concentrate on represent-            managing the episodic memories represented with our
ing action pre- and post-conditions, and sequences. In-         model. It also allows to ask queries about it, such as
telligent embodied agents need to bridge the gap be-            how the robot was moving when an action was per-
tween these representations with missing information            formed, and to visualize snapshots of the activity with
and the actual execution of an action in the physical           visual annotations. Figure 1 shows such an example
world. Bridging this gap is non trivial and a problem           where the robot was closing a drawer in a kitchen en-
which is widely unsolved on the abstract level (i.e.,           vironment. The action is properly segmented into the
by re-usable general knowledge). It is further expected         different motion phases, which is also visible in the Fig-
that conditions and effects of actions are pre-defined –        ure. The vision is to collect a large data set of episodic
a hard to meet requirement with the diversity of effects        memories, and utilize them for learning tasks to gain a
actions may cause in the physical world.                        better understanding about manipulation activities.
   The central question for successful embodied action
execution is how agents should move their bodies to             Related Work
achieve certain effects while avoiding unwanted side-
effects. This is, for example, how a robot should move          There are several projects with efforts to provide
its arm such that the pancake mix contained in the              symbolic knowledge about manipulation activities to
bottle it holds is poured on top of the pancake maker,          robots. The most notable one is the IEEE-RAS work-
and forms a pancake with 10cm diameter. In the area             ing group ORA (Ontologies for Robotics and Au-
of AI there are only few approaches that address this           tomation) [13], which aims at defining standards for
problem despite the semantic nature of this reasoning           knowledge representation in robotics. Schlenoff [12]
problem.                                                        also presented a related approach for detecting inten-
   One of the peculiarities of our KR&R system is that          tions in cooperative human-robot environments based
it runs inside the perception-action loop of a robotic          on states which are more easily recognizable by sensor
agent. Symbols correspond to data structures of the             systems than actions. In his work, intentions are also
robot control system, and as such they have a rather            used for the prediction of the next action. For this
simple grounding. The representations in our system             work, we extend the KnowRob system [17], which,
are inspired by the role that episodic memories play in         among others, defines concepts for actions, and their
the acquisition of generalized knowledge in the human           effects [16]. KnowRob also has a notion of motion
memory system [18].                                             phases, but these are not defined using force dynam-
   The proposed representation of episodic memories             ics.
consists of two parts. One part stores experiences and             Another related branch of research is task and mo-
events as symbolic data. Those events and experiences           tion planning. In this work, we present an action model
can be e.g. perceived objects, or performed actions,            that can be used to yield higher level activities from
their duration, and possible failures. The second part          observations of force events. Such force events can
stores sensor data from the robot in a database. We             also be detected through haptic feedback, and be used
define this unstructured data as sub-symbolic data.             to minimize uncertainty during manipulation activ-
In the first section, we will describe the symbolic             ity planning [19]. The relation of our system to gen-
knowledge representation. An overview about the sub-            eral planning systems is that planning domains can
symbolic data will be given afterwards. Then, we will           be represented using our model and that plan param-
show how those memories can be used to improve the              eters can be inferred from knowledge represented in
robot’s action models by getting insights about ma-             our system. Action models in traditional planning sys-
nipulation activities. This will be achieved by using           tems (such as PDDL) often only consider action pre-
a combination of query answering and visual analytic            conditions and their effects, and do not incorporate
tools.                                                          more detailed information about motions and forces.
   Our KR&R system is made available as part of the             More recently, systems emerged that enable robots to
knowledge web service openEASE 1 [4]. The web ser-              perform planning on both task and motion level by in-
vice gives the KR community the opportunity to do               troducing an interface layer between task and motion
research in the context of real robot experiments. Re-          planner [14, 5]. Our action model could be used by
searchers in the field of KR-based robot control can            such systems to represent tasks, and to define action
further extend the knowledge base of the web ser-               pre-conditions which are occurrences of force events.
vice by providing additional episodic memories of their            Another aspect is that our system can yield partial
robots performing manipulation activities.                      boundaries of motion phases given some observation.
                                                                Motion segmentation methods typically apply some
  1 http://www.open-ease.org/                                   form of clustering to build stochastic representations of




                                                           12
primitive motions and motion sequences. These meth-               The property contact+ is further decomposed into
ods include self-similarity [8], k-means [10] or hierar-          functional properties contact+1 and contact+2 denot-
chical [20] clustering. Primitive motions are often rep-          ing the two salient objects during the contact event
resented as Hidden Markov Models [9, 11, 15] and se-              (the two objects can be randomly assigned). The ob-
quences as stochastic motion graphs [15]. This research           jects remain touched until they separate again which
has mainly focused on body motions with some excep-               is indicated by a LeavingContactEvent. The contact
tions that also consider object movement [9, 11]. Con-            is either caused by an agent moving objects into con-
trary to our approach, the listed motion segmentation             tact, or through a physical process such as gravity, for
approaches do either not consider manipulated object              example, pulling an object such that it falls onto the
movement or only consider its trajectory. Instead, we             floor.
define motion boundaries according to interactions of                Creation (CreationEvent) and destruction (De-
objects with the physical world through force dynam-              structionEvent) events are also distinctive subgoals of
ics. These contact states seem particularly important             activities that we use for activity representation at a
for control strategies employed by humans [6].                    higher level of our ontology (e.g., cutting a bread cre-
                                                                  ates a slice of bread).
Narrative of Episodic Memories                                       The last category of physical events we consider in
                                                                  the scope of this work are fluid flow events (Fluid-
This section introduces an action model for robots in-            FlowEvent). These are events in which some liquid or
spired by the Flanagan model. The basis of it are force           gaseous substance moves, for example, milk flowing
events that occur when an agent moves its body, and               from a package to a glass, or water flowing in a river.
the different motion phases of actions. Our ontology              Such events may be intended as in “pouring milk in a
is organized along these areas. It has 4 levels: Force            glass”, or unintended as in “spilling milk on the floor
events, situations, motion phases, and intentional ac-            during navigating”. The primary involved object is the
tivities. In addition, we use rules to declare identity           liquid or gaseous substance, linked to the event via the
constraints. In this section we provide a description of          functional property fluid v involved.
how this information is organized and represented.
   In this work, we build upon the KnowRob ontol-                 Force Situations
ogy, and (manually) extend it with concepts of our
action model such as ForceEvent and PouringMotion.                At the next level of our ontology there are situations
We have chosen KnowRob because it provides the                    during which force events occurred (ForceSituation v
necessary infrastructure for interfacing with robot con-          Situation). Force events occur at time instants, for ex-
trol systems, and to record episodic memories from                ample, in the moment the hand touches some object,
task execution. It defines concepts such as Event and             and when it leaves contact again. We use such tempo-
Situation, and also specific ones to describe e.g. robots         ral patterns of force events to expand them to distinc-
and their parts.                                                  tive situations.
                                                                     Sub-events are linked to situations via the inverse
Force Events                                                      functional event object property. With inverse func-
                                                                  tional we imply that each event can only be the sub-
At the lowest level of our action representation there            event of a single situation. For detecting situations,
are events that physical objects cause in a (simulated)           we use two dedicated events: One indicating the start
physical world. They are described independently from             and the other indicating the end of the situation.
intentions. This is to allow detecting them fully au-             These are represented using the functional properties
tomated, without taking into account previous events              starter v event for the event starting the situation,
and higher level knowledge about task or embodiment.              and stopper v event for the one stopping it.
   PhysicalEvent v Event is the most general concept                 Surely, starter event should occur before stopper
in this ontology. It implies that physical events occur           event. Situations during which the object is not in
at a particular time instant (derived from Event), and            contact could else be classified as contact situations.
that at least one object is involved. Involved means              We use predicates from Allen’s interval algebra [1] and
that one of their physical properties is salient during           an identity constraint to assert this relation between
the event. This is the case if the object involved is cre-        starter and stopper event. As illustration, this con-
ated or destroyed, touched or untouched, transformed              straint can be written as:
into something else, etc.
                                                                            ∀ instance of(x, ForceSituation) :
   The most essential events are the contact events                                                                   (1)
                                                                            ∃(stopper ◦ after ◦ starter− )(x, x)
(ContactEvent) that occur whenever an object moves
in the world such that it touches (contact+ v involved)           Note that the fact that some event occurred after an-
another object within a spatial region (contactRegion).           other one is inferred on demand by our reasoner and




                                                             13
does not need to be asserted. The begin time of the sit-        Arm Movement
uation is further defined as time of occurrence of the
                                                                Arm movements are fundamental for object manipu-
starter, and the end time as the time of occurrence of
                                                                lation. The repertoire of different arm motions of hu-
the stopper.
                                                                mans is rich: reaching, lifting, throwing, cutting, pour-
   The starter event of contact situations is the con-          ing, etc. Some of which have distinct patterns of force
tact event and the stopper event is the leaving contact         dynamic events, such as cutting, that we use for rep-
event. Both have exactly the same involved objects.             resenting them.
We represent this type of information using identity                We use force events as delimiters of motion phases.
constraint rules using a property chain starting from           In particular important are contact situations between
the starter event via involved objects, stopper event,          body parts and other objects. Motions during which
and back to the starter event.                                  lifetime the contact between body part and object is
   Fluid flow situations are a bit different because            continuously salient are called carrying motions (Car-
there are no distinct starter and stopper event types.          ryingMotion). The body part in contact with the ob-
At some time instant the first and at a later time the          ject must be part of the body part (denoted by partOf )
last fluid flow event of a situation occurs. However,           which is moved during the motion. This is to allow, for
not every sequence of fluid flow events referring to the        example, that the contact occurs between hand and
same fluid makes a situation. If the container is put           tool while the body part referred to by the motion is
aside for a while, for example, one would rather say            the arm (which in turn has a hand part).
that the situation ended then, and that a new situa-                Objects held by agents may also touch other objects
tion starts when the container is used later on. This           or liquids during the motion, causing distinct force
can be enforced by asserting that, during fluid flow            events during that interaction. We use this pattern
situations, the container may only be salient for fluid         of force events for the representation of tool motions.
flow events.                                                    A cutting motion, for example, is a carrying motion,
                                                                performed with a cutting tool, during which some ob-
                                                                ject was cut into pieces. Cutting events may also be
                                                                destruction events in case the object cut into pieces
Motion Phases                                                   entirely disappeared. We further assert that the tool
                                                                used in the cutting event (cutter ) is also salient during
Motions can be detected by monitoring the joint con-            the carrying situation.
figuration of an agent. Movements are either reflexive              Another challenging manipulation task is pouring.
or intentional. But at this level of our ontology, with-        It can be performed in many ways, and on many dif-
out knowing intentions of agents, we can not distin-            ferent expert levels. The motion profiles of different
guish between reflexive and intentional motions and             expert levels are drastically different, but they all gen-
represent motions solely in terms of expected events            erate fluid flow events when particles are leaving the
and body parts used.                                            source container. We represent pouring motions as
   The different body parts are defined in the                  contact situations with a subgoal which is a fluid flow
KnowRob ontology. Here, we define a general “body               situation. First, we state that pouring motions are car-
part moved” concept for each of these body parts.               rying situations where a container that contains some
We define the functional relation partMoved to rep-             fluid is a salient object, and that at least one fluid
resent which body part moved during a motion, and               flow situation is a subgoal of this situation. We further
restrict the range of this property to the correspond-          state that the fluid transported in fluid flow events of
ing body part type. For ArmMovement’s, for example,             subgoals is exactly the fluid inside of (contains) the
we assert: ∀partM oved.Arm and = 1partM oved.Arm.               contacted container.
Force events salient for a motion are denoted by the
inverse functional event relation. Temporal ordering            Activities
constraints are asserted by temporal properties before,         At the highest level of our ontology there are activities
after, and during.                                              composed of motions with expected event patterns.
   Here, we only investigate arm movements. Hand                At this level of the ontology, the intention of agents
movements are also represented, but only at a coarse            is implied by action concepts. The standard example
level using a boolean state: Opened or closed. We also          quoted in the work of Flanagan et al. is a fetch-and-
ignore gaze motions in this work. However, it would             place activity. During fetch-and-place tasks, there is
be interesting to look into gaze contact events and to          a contact situation between agent and fetched object,
compare gaze patterns for different expert levels in fu-        and also distinct events indicating that the carried ob-
ture work.                                                      ject first leaves contact to a supporting surface, and




                                                           14
later gets into contact with a supporting surface again.             The data is used by our knowledge system to answer
   We state that fetch-and-place activities have a sub-           questions such as: “Where was the base relative to the
motion which is a carrying motion. And that there are             object, 5 seconds ago”.
two additional force events linked to the action via the
subevent relation. We further state that there is a sub-
                                                                  Reasoning with Episodic Memories
event in which the carried object looses contact to a
supporting surface.                                               The knowledge represented in acquired experiences is
   At this level, we can distinguish between colliding,           very comprehensive. It not only contains narrations
supporting, and intentionally touching. Unexpected                of activities but also raw experience data. Competent
contacts during an activity are classified as collisions.         robot behavior needs both: Experience data encodes
This makes it very easy to detect them. With expected             particularities of motions such as forces and velocities,
we mean that the activity concept asserts their occur-            and the narrative is required to make sense of the data
rence during the activity.                                        at higher cognitive levels.
   We use the same scheme to distinguish between                     Here, we provide reasoning examples with our ac-
pouring and spilling: Pouring actions have intended               tion representation. We first describe how activities
fluid flow subgoals while spillage events are exactly             can be obtained from force events, and also how an
the unintended fluid flow events occurring during an              agent can make sense of action concepts. We finally
action. More concretely, pouring actions have a tar-              outline some analytical reasoning tasks that can be
get location where the fluid should be poured into or             performed on episodic memories.
onto. We classify all fluid flow events where the fluid is
transported to somewhere else then the target location
                                                                  Activity Parsing
as spillage events.
                                                                  In virtual worlds, force dynamic events can be mon-
Experience of Episodic Memories                                   itored perfectly. These can be asserted to the knowl-
                                                                  edge base as they occur. Given the occurrence of force
Experience data captures low-level information about              events, we can infer new knowledge using descriptions
experienced activities represented as time series data            from higher levels of our ontology. In the first step, the
streams. This data has often no or only unfeasible loss-          events are expanded to situations. The situations are
less representation as facts in a knowledge base. To              then refined to motions with distinct force event pat-
make this data knowledgable, procedural hooks are de-             terns. Finally, high level activities are detected based
fined in the ontology to compute relations from the ex-           on patterns of force events and motions.
perience data, and to embed this information in logic-
based reasoning.
   The data is stored in a NoSQL database using JSON              Expanding Force Events
documents. Each individual type of data is stored in a            The expansion process exploits representations of sit-
collection named according to the type of data stored             uation concepts to identify events that determine
in it. When imported, the knowledge system stores the             the situation. Situations are determined by so called
data in a MongoDB 2 server, for which the knowledge               starter and stopper events. The events are processed
system implements a client for querying the data dur-             from earliest to latest. A situation symbol is created
ing question answering.                                           when a starter event was detected, and a triple that
                                                                  specifies the starter relation is asserted. The proce-
Pose Data                                                         dure stores a list of situations without stopper events.
A robotic system typically has many mobile compo-                 For each new event, this list is first iterated to test
nents arranged in a kinematic chain. Each component               whether the event is a stopper event of the situation,
in a kinematic chain has an associated named coor-                and a triple that specifies the stopper relation is as-
dinate frame such as world frame, base frame, grip-               serted if this is the case. Finally, it is also tested if new
per frame, head frame, etc. 6 DOF relative poses are              events are sub-event of one of the situations without
assigned to frames. These are usually updated with                stopper.
about 10 Hz during movements, and expressed relative
to the parent in the kinematic chain to avoid updates             Classifying Motions
when only the parent frame moves. The transforma-
tion tree is rooted in the dedicated world frame node             We assume that arm motions are only segmented by
(also often called map frame).                                    zero velocity segmentation in advance. We use force
                                                                  events as delimiters for coarse-grained segmentation.
  2 https://www.mongodb.com/                                      We think that this segmentation is sufficient because it




                                                             15
captures the force events which are the essential sub-           etc. The inferred action symbol is bound to a vari-
goals of manipulation activities. Here, we only con-             able which is used as index to sub-symbolic data in
sider arm motions. For each situation during which an            the experience part of episodic memories. This is done
arm motion occurred, we iterate through the different            to access data slices corresponding to the semantic ac-
subclasses of ArmMotion which are also contact situa-            tivity for which the symbol was inferred earlier. An
tions, and we test if classifying the situation with that        example of such a query is shown in the following:
type would yield a contradiction. The motion type is             e n t i t y ( Act , [ an , a c t i o n , [ type , p u t t i n g d o w n ] ] ) ,
asserted if this is not the case. The motion classes are         o c c u r s ( Act , [ , End ] ) ,
disjoint such that situations can only be classified as          h o l d s ( p o s e ( p r 2 : ’ p r 2 b a s e l i n k ’ , Pose ) , End ) .

being instance of one of the motion classes.                     Which corresponds to the question “Where did the
                                                                 robot stand at the end of put-down actions?”.
Parsing Activities                                                  Based on our model, we can also ask questions
                                                                 about the goals of an action, for example, “What mo-
Motions and force events are then used as building
                                                                 tion phases are the subgoals of an action”. For our in-
blocks for activities. Activities can be parsed using
                                                                 troductory example of a robot closing a drawer (see
rules that detect temporal patterns of events and mo-
                                                                 Figure 1), the motion phases can be queried with a
tions that are distinctive for them. Force events and
                                                                 query such as:
motions that are subgoals of activities are denoted by
the subevent and submotion properties. Patterns with             e n t i t y ( Act , [ an , a c t i o n , [ type , c l o s i n g a d r a w e r ] ,
                                                                        [ part moved , [ an , o b j e c t , [ b a s e l i n k , HandBase ] ] ] ] ) ,
partial ordering constraints can be inferred from this           f i n d a l l (M, e n t i t y ( Act , [ s u b m o t i o n , M] ) , Motions ) .
model. The output of the parser is an ontology, de-
scribing instances of detected actions. Here, we provide         For a more detailed description of the question answer-
one hand-written rule that is used to detect pick-and-           ing system used here, please consult the system paper
place activities shown in Algorithm 1.                           written by Beetz et al. [2].

Algorithm 1 Detect Pick-and-Place                                Activity Analytics
 1: procedure detect-pick-and-place
 2:    CarryingMotion(?s), contact+S (?s, ?obj),                 Episodic memories are very comprehensive and addi-
 3:    LeavingContactEvent(?ev1),                                tional tools for inspection are required. For illustra-
 4:    loose-support-event(?s, ?ev1, ?obj),                      tion, we pick one simple pick-and-place task performed
 5:    ContactEvent(?ev2),                                       by a robot and show how our visual analytics tools are
 6:    gain-support-event(?s, ?ev2, ?obj),                       used to get insights about manipulation activities and
 7:    before(?ev1, ?ev2).                                       reasoning processes. Our goal is to provide tools for
                                                                 gathering data for learning algorithms, and to learn
 8: procedure loose-support-event(?s,?ev,?obj)
                                                                 about the requirements for robots performing every-
 9:    contact-(?ev, ?obj), contact-(?ev, ?t),
                                                                 day activities. Clustering methods may be used, for
10:    SupportingSurface(?t),
                                                                 example, to group actions based on their parameter-
11:    stopper(?s, ?x), before(?ev, ?x).
                                                                 izations, and to identify e.g. what kind of actions re-
12: procedure gain-support-event(?s,?ev,?obj)
                                                                 quire two arms to be performed successfully, or what
13:    contact+(?ev, ?obj), contact+(?ev, ?t),                   kind of actions require additional tools. Different com-
14:    SupportingSurface(?t),                                    ponents of our analytics framework will be described
15:    starter(?s, ?x), after(?ev, ?x).                          below.

                                                                 Action Hierarchy Visualization
Activity Interpretation
                                                                 Cognition-enabled plan frameworks, such as CRAM
Our ultimate goal is to enhance the performance of               [3], generate action hierarchies instead of sequences of
robots by supplying them with knowledge about every-             actions. This is because, in cognition-enabled plans,
day activities, and in particular with high-level stories        most actions are abstract and require reasoning which
about what happened combined with experience data.               results in action hierarchies. For instance, a pick and
In this section, we provide a description of how robots          place action requires a ”pick” sub-action to be per-
may use the information represented in episodic mem-             formed followed by a ”place” sub-action. Action hier-
ories.                                                           archies are stored in our episodic memory as symbolic
   A typical query first asks for a particular seman-            data. To get a better understanding of an experiment,
tic action that fulfills certain constraints such as be-         openEASE contains a component to visual the whole
ing successful, being performed by a particular agent,           action hierarchy. This visualization gives an overview




                                                            16
                                                                manipulation




                                                                                                                                                               grasping
                                                                                                                         location




                                                                                                                                             motion
                                                  unreachable




                                                                                                                                    action




                                                                                                                                                      object
                                      not found




                                                                               not found
                          collision
                                                                                                    MovingToLocation     417        35       35
                                                                                                       BaseMovement       2         24       24
                                                                                                            LookingAt     18        12       12
      MovingToLocation    13                                                                         VisualPerception                9       29
       VisualPerception                5                                                                   LookingFor    15         14
        FetchAndDeliver                           2               1             1                             Reaching   14          3       10
     PickingUpAnObject                            2                                                 MovingToOperate      7          14                 2
      MovingToOperate                                             1                                PickingUpAnObject     7           4       1         2        1
   LookingForSomething                                                          1                           Retracting   6           2       5
                                                                                                         PuttingDown     4           2                 2        1
                                                                                                        LiftingAnArm     4           1       3
Figure 2: Co-occurrence matrix between actions and                                                   OpeningAGripper                 4       4
errors. The cell values indicate how many times the                                                   LoweringAnArm      2           1       2
                                                                                                               Pulling   2           1       2
error occurred for each action type.                                                                  FetchAndDeliver                5
                                                                                                         AcquireGrasp    2           1       2
                                                                                                     ClosingAGripper                 2       2
                                                                                                      SettingAGripper                2       2

about what actions were executed by the robot, the re-                                          Figure 3: Co-occurrence matrix between actions and
lationship between those actions and which tasks were                                           reasoning tasks. The cell values indicate how many
successful and which not.                                                                       times a reasoning task was performed during each ac-
   With our visual analytics framework we want to go                                            tion type.
beyond showing just hierarchies and statistics. Each
visual component is linked to the knowledge base
                                                                                                Visualization of Reasoning Tasks
which allows us to perform queries on the displayed
data. To be specific, the nodes in the action hierar-                                           Cognition-enabled plans require a significant amount
chy can be selected by the user, and the user can ask                                           of reasoning. We provide multiple visualization tools
queries about them such as getting the error type of                                            available to get insights about reasoning processes.
an unsuccessful task, the time duration, etc. In addi-                                          Figure 3 shows a co-occurrence matrix with the action
tion, trajectories during actions can be queried and                                            types (rows) and the reasoning questions (columns)
visualized. Having the experience data linked to the                                            which are asked during a pick and place action. This
narrative of an activity further allows to correlate suc-                                       matrix gives an overview which reasoning tasks were
cess of an action with e.g. the goal pose relative to the                                       performed the most and which tasks required the most
base.                                                                                           reasoning. In our example, a significant amount of spa-
                                                                                                tial and perception reasoning tasks were performed.
                                                                                                   Our analytics framework serves additional statis-
Visualization of Errors                                                                         tics, such as depicted in Figure 4. The left pie chart
                                                                                                shows the ratio between the frequency of reasoning
For every episodic memory we can request a co-                                                  tasks compared to actions. A high number of reason-
occurrence matrix between actions and errors which                                              ing tasks indicates the robot performed a very abstract
occurred during an activity. Figure 2 shows an error                                            plan since it required a lot of reasoning to be able to
matrix for a pick and place activity. The rows and                                              execute it. The right pie chart in Figure 4 depicts an
columns can be sorted by frequency to get quickly an                                            overall time usage between reasoning and action exe-
overview which actions failed the most or which error                                           cution. Note that even though the general amount of
type occurs the most. Referring to Figure 2, the matrix                                         reasoning tasks is significantly higher than the num-
shows that most failed action was MovingToLocation                                              ber of actions, the action execution requires the most
due to collision.                                                                               time. This insight gives us the the opportunity to let
                                                                                                the robot do more expensive reasoning in the future
   We are also using the error matrix to extract action
                                                                                                without extending the overall experiment runtime be-
preconditions which were not considered during plan
                                                                                                cause we could run the reasoning in parallel during the
design. Currently we are extracting the preconditions
                                                                                                action execution.
manually. In the future we are planning to automatize
this extraction so the robot can extend its action model
by itself.                                                                                      Conclusion
   The matrix is also linked to the knowledge base, this                                        In this paper, we have introduced an approach for rep-
allows us to query detailed information about the er-                                           resenting episodic memories of embodied agents per-
rors. For instance, for perception errors we can query                                          forming manipulation tasks. The action model is in-
which objects could not be perceived. Those queries                                             spired by a model from human psychology. Its repre-
can give us an overview e.g. for which objects the per-                                         sentations are based on force dynamic events which
ception system might need to be improved.                                                       are used to define semantics of action verbs. We have




                                                                                           17
  Reasoning                                                        D. Wilkins. PDDL–the planning domain defi-
                          Action
              86                   98                              nition language. AIPS-98 planning committee,
                                        2
                   14                       Reasoning              1998.
                        Action
                                                                [8] B. Krüger, A. Vögele, T. Willig, A. Yao, R. Klein,
Figure 4: The left chart shows the frequency of reason-             and A. Weber. Efficient unsupervised temporal
ing tasks (791) compared to the number of performed                 segmentation of motion data. IEEE Transactions
actions (124). The right chart shows how much time                  on Multimedia, 19(4):797–812, 2017.
was spend during action execution (180.61 sec) and
                                                                [9] V. Kruger, D. L. Herzog, S. Baby, A. Ude, and
resoning (3.68 sec).
                                                                    D. Kragic. Learning actions from observations.
                                                                    IEEE Robotics Automation Magazine, 17(2):30–
shown that patterns of force events can be used to de-              43, 2010.
tect intentions, and what actions an embodied agent
performed. The action model is coupled with expe-              [10] K. Kulkarni, E. Boyer, R. Horaud, and A. Kale.
rience data that stores control level information. We               An unsupervised framework for action recogni-
believe that collections of episodic memories are key               tion using actemes. In R. Kimmel, R. Klette, and
for understanding how experiential knowledge about                  A. Sugimoto, editors, Computer Vision – ACCV
manipulation tasks can be generalized.                              2010. Springer Berlin Heidelberg, 2011.
                                                               [11] Sanmohan, V. Krüger, and D. Kragic. Unsu-
References                                                          pervised learning of action primitives. In 2010
 [1] J. Allen. Maintaining knowledge about tem-                     10th IEEE-RAS International Conference on Hu-
     poral intervals. Communications of the ACM,                    manoid Robots, 2010.
     26(11):832–843, 1983.
                                                               [12] C. Schlenoff. Inferring intentions through state
 [2] M. Beetz, D. Beßler, A. Haidu, M. Pomarlan,                    representations in cooperative human-robot en-
     A. K. Bozcuoglu, and G. Bartels. Knowrob 2.0                   vironments. (Déduction d’intentions au travers
     – a 2nd generation knowledge processing frame-                 de la représentation d’états au sein des milieux
     work for cognition-enabled robotic agents. In In-              coopératifs entre homme et robot). PhD thesis,
     ternational Conference on Robotics and Automa-                 University of Burgundy, Dijon, France, 2014.
     tion (ICRA), Brisbane, Australia, 2018.
                                                               [13] C. Schlenoff, E. Prestes, R. Madhavan,
 [3] M. Beetz, L. Mösenlechner, and M. Tenorth.                    P. Goncalves, H. Li, S. Balakirsky, T. Kramer,
     Cram—a cognitive robot abstract machine for                    and E. Miguelanez. An IEEE standard ontology
     everyday manipulation in human environments.                   for robotics and automation.      In Intelligent
     In Intelligent Robots and Systems (IROS), 2010                 Robots and Systems (IROS), 2012 IEEE/RSJ
     IEEE/RSJ International Conference on, pages                    International Conference on, pages 1337–1342.
     1012–1017. IEEE, 2010.                                         IEEE, 2012.

 [4] M. Beetz, M. Tenorth, and J. Winkler. Open-               [14] S. Srivastava, E. Fang, L. Riano, R. Chitnis,
     EASE – a knowledge processing service for robots               S. Russell, and P. Abbeel. Combined task and
     and robotics/ai researchers. In IEEE Interna-                  motion planning through an extensible planner-
     tional Conference on Robotics and Automation                   independent interface layer. In IEEE Interna-
     (ICRA), Seattle, Washington, USA, 2015. Final-                 tional Conference on Robotics and Automation
     ist for the Best Cognitive Robotics Paper Award.               (ICRA), 2014.

 [5] N. T. Dantam, Z. K. Kingston, S. Chaudhuri, and           [15] W. Takano, H. Imagawa, and Y. Nakamura.
     L. E. Kavraki. Incremental task and motion plan-               Spatio-temporal structure of human motion prim-
     ning: A constraint-based approach. In Robotics:                itives and its application to motion prediction.
     Science and Systems, 2016.                                     Robotics and Autonomous Systems, 75:288 – 296,
                                                                    2016.
 [6] J. R. Flanagan, M. C. Bowman, and R. S. Jo-
     hansson. Control strategies in object manipula-           [16] M. Tenorth and M. Beetz. A unified representa-
     tion tasks. Curr. Opin. Neurobiol., 16(6):650–659,             tion for reasoning about robot actions, processes,
     Dec 2006.                                                      and their effects on objects. In 2012 IEEE/RSJ
                                                                    International Conference on Intelligent Robots
 [7] M. Ghallab, A. Howe, C. Knoblock, D. Mc-                       and Systems (IROS), Vilamoura, Portugal, Oc-
     Dermott, A. Ram, M. Veloso, D. Weld, and                       tober, 7–12 2012.



                                                          18
[17] M. Tenorth and M. Beetz. KnowRob – A Knowl-              ular optimization. In 15th IEEE-RAS Inter-
     edge Processing Infrastructure for Cognition-            national Conference on Humanoid Robots, Hu-
     enabled Robots. Int. Journal of Robotics Re-             manoids 2015, Seoul, South Korea, November 3-5,
     search, 32(5):566 – 590, April 2013.                     2015, pages 407–413, 2015.
[18] E. Tulving. Episodic and semantic memory             [20] F. Zhou, F. D. l. Torre, and J. K. Hodgins. Hi-
     1. Organization of Memory. London: Academic,              erarchical aligned cluster analysis for temporal
     381(e402):4, 1972.                                        clustering of human motion. IEEE Transactions
                                                               on Pattern Analysis and Machine Intelligence,
[19] N. A. Vien and M. Toussaint. Touch based                  35(3):582–596, 2013.
     POMDP manipulation via sequential submod-




                                                     19